Speech recognition platforms today suffer from poor recognition rates in a noisy environment such as kids in the back seat of cars, radio, ambulance passing by, or heavy rain. Video conferencing becomes ineffective when one of the participants is in a noisy environment like a coffee shop.
Hi Auto developed a software-only speech enhancement and speaker separation system that eliminates the most challenging noises and focus only on the speaker.
Their deep learning algorithm leverages a speaker facing camera and a single microphone for more accurate speech recognition and clear speech enhancement.
The system works on devices or through cloud API and eliminates the most challenging noise sources.
They utilize a speaker facing camera that tracks the lips movement to separate the speaker from any noise. While in many scenarios their deep learning algorithm works well with audio input only, the use of camera and audio-visual proprietary algorithms enables removal of noises that isn’t possible using audio-only methods.