ReleaseJune 6, 20261 min read

Revolutionary Voice Model Listens and Responds in Real-Time, Outpacing Rivals

A new open-source voice model can process continuous audio streams, making decisions every 0.4 seconds to speak or stay silent, and has already surpassed rival models in key benchmarks. This breakthrough technology has the potential to transform the way we interact with voice assistants and other AI-powered devices.

Unlike GPT-4o or Qwen3.5-Omni, Audio Interaction doesn't wait for a recording to end: it translates, transcribes, chats, and picks up everyday noises like coughing in a single stream. Code, model weights, and download instructions are available on GitHub under the Apache 2.0 open-source license, with the training data to follow. The article New open-source voice model listens nonstop and decides every 0.4 seconds whether to speak or stay silent appeared first on The Decoder.

Browse Models Compare All News

Revolutionary Voice Model Listens and Responds in Real-Time, Outpacing Rivals

ChatGPT Blurs Professional Lines: 43.5% of Job-Specific Queries Involve Other Professions

Explore