Mitsubishi Electric Research Labs, the North American research arm of Mitsubishi Electric Corporation, announced a breakthrough in speech separation technology at an annual R&D open house in Tokyo.
The research lab noted it has created the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. “Solving the so-called ‘cocktail party problem’ has been the holy grail of speech processing for more than 50 years: we finally cracked it,” said Jonathan Le Roux, Ph.D., principal research scientist at MERL and one of the leading researchers on the project.
In tests, the simultaneous speeches of two and three people were separated with up to 90 and 80 percent accuracy, respectively. The technology, which was achieved using Mitsubishi Electric’s proprietary “deep clustering” method based on artificial intelligence (AI), is expected to contribute to more intelligible voice communications and more accurate automatic speech recognition. A characteristic feature of this approach is its versatility, in the sense that voices can be separated regardless of their language or the gender of the speakers.
Mitsubishi Electric will explore opportunities to apply this new speech separation technology to improve the quality of voice communications and the accuracy of automatic speech recognition in real environments, such as cars, homes and elevators.
“Until now, there has been no effective method to accurately reconstruct the speech of multiple unknown speakers recorded with just one microphone,” said Richard Waters, Ph.D., president, CEO and founding member of Mitsubishi Electric Research Laboratories.