Vocal Recognition Speech Signal Processing Extracts Single Person's Voice for Voice Over Internet Protocol (VoIP) and Medical Transcription
This speech signal processing system performs vocal recognition for applications like voice-over-internet potocol (VoIP), medical transcription and hearing aids.
The system recognizes vocal patterns and tracks each voice in a group so that an individual's voice can be selected and extracted from background noises and other competing voices.
Human speech changes rapidly in pace and pitch over a short period of time. This speech technology tracks each individual voice through time so that the unique sounds and properties of that voice can be reconstructed and presented to the listener.
Using a recursive neural network, this speech signal processing system determines the fundamental frequency of individual voices and then predicts how each individual's pitch will change in the future, based on past behavior.
The result is that each of the voices present in a recording can be tracked, selected, extracted, and either stored for future analysis or used directly in real time. For example, the system may be used in a digital speech signal processing chip in a hearing aid to select and amplify one person’s voice.
Speech Technology for Voice Recognition Software, VoIP and Hearing Aids
It is often very difficult for listeners, be they human or machine, to identify a single speaker's voice from a mixture of sounds, because the individual voice and the other voices or noises in the recording environment have very similar properties. Simple filtering techniques to remove unwanted noise are not able to remove the noise without also removing the intended signal or the individual targeted voice.
This is particularly difficult in situations such as operating voice recognition software or using a hearing aid in a noisy environment.
The only way to improve speech understanding for these listeners is to separate a single talker’s voice from the background mixture of competing voices and sounds.
Speech Signal Processing for VoIP and Medical Transcription Services
While there are currently several mechanisms available for speech extraction, none of these specifically attempt to put together the speech sounds of each individual talker as they occur through time.
This technology has vast potential software and hardware applications. For example, there are more than 1.2 billion clinical records produced in the U.S. every year and over 60% of clinical records are documented via traditional dictation/transcription. Global revenue for transcription services exceeds $2 billion per year. In addition, mandates have been passed that will require hospitals and physicians to implement electronic health records by 2015. Current speech-to-text software has very high error rates due to poor recording of transcripts. This technology will reduce background noise and preserve the voice/speech of the transcribing physician improving speed and accuracy of the transcriptions.
This technology can also be implemented into multiple types of communication devices, networks, and systems including VoIP programs like Skype. The voice tracking system is able to track the talker’s voice and preserve that voice while reducing background noise and even other voices in the environment. Market research firm Datamonitor projected the market for speech technology embedded into devices would reach $500 million per year by 2010.
- Medical transcription
- Hearing aids
- VoIP systems like Skype
- Voice recognition software
- Law enforcement
- Removes background noise from live and recorded conversations without compromising the quality of the voice being recorded
- Can simultaneously track multiple voices and identify and filter specific voices in real time
- Can be used in any communication device, networks, and systems, including VoIP