Speech Recognition

3 min readJun 10, 2019

This article explains how to add speech recognition to Python Project.
Speech recognition in Python is really simple.

How Speech Recognition works?

1) First element of speech recognition is speech.
2) Speech is converted from physical sound to an electrical signal with a microphone,and then to digital data with an analog-to-digital converter.
3) Once digitized, models to transcribe the audio to text is used.

Python Speech Recognition Packages -

Few popular packages are -

google-cloud-speech
SpeechRecognition
watson-developer-cloud
apiai
wit

i) apiai and wit identifies a speaker’s Intent , which is more than basic speech recognition.

ii) google-cloud-speech and SpeechRecognition focus entirely on speech-to-text conversion.

Lets use SpeechRecognition to recognize speech from audio input.

SpeechRecognition library uses Google Web Speech API — which supports a default API key that is hard-coded into the SpeechRecognition library so that no sign up is required for service.

Installing SpeechRecognition -

$ pip3 install SpeechRecognition

SpeechRecognition works well with existing audio files.

In case where audio is captured using microphone input, PyAudio is needed.

PyAudio can be installed as :-

$ sudo apt-get install python-pyaudio python3-pyaudio

Once installed we need to execute following command

$ pip3 install pyaudio

if we are working in a virtual environment

The Microphone Class -

Create an instance of the recognizer class

import speech-recognition as sr

r = sr.Recognizer( )

to use microphone create an instance of the Microphone class.

mic = sr.Microphone( )

If we want to use a microphone other than the default we can supply device index.

We can list a microphone names by calling -

list_microphone_names( ) method of Microphone class.

>> sr.Microphone.list_microphone_names( )

[‘HDA Intel PCH: ALC272 Analog (hw:0,0)’,
‘HDA Intel PCH: HDMI 0 (hw:0,3)’,
‘sysdefault’,
‘front’,
‘surround40’,
‘surround51’,
‘surround71’,
‘hdmi’,
‘pulse’,
‘dmix’,
‘default’]

for example if we are using microphone called ‘front’ which has index3 in the list, then we can create a microphone instance as

mic = sr.Microphone(device-index=3)

Capturing Microphone Input with listen( )

To capture input from the microphone use the listen( ) method of the Recognizer class.

This method takes audio source as its first argument and records input from the source.

>> with mic as source:
audio = r.listen(source)

speak ‘Hello’ into microphone.

Now speech (‘Hello’) is ready to be recognized.

>> r.recognize_google(audio)

recognize_google( ) is called to transcribe any speech in the recording.

Set language keyword argument of the recognize_google(audio,language)

method to get desired language.

e.g — ‘en-US’ for American English

While making use of microphone input, to handle ambient noise, use adjust-for-ambient-noise( ) method of Recognizer class.

>>> with mic as source:
r.adjust_for_ambient_noise(source)
audio = r.listen(source)

Code for Speech Recognition(audio to text conversion)

import speech_recognition as sr
r=sr.Recognizer( )
with sr.Microphone( ) as source:
r.adjust_for_ambient_noise(source)
print(‘Hi tell me something’)
audio=r.listen(source)
print(‘Good to hear you’)

try:
print(‘Speech to Text:’+r.recognize_google(audio,language= ‘en-US’))
except:
pass

Output -
Hi tell me something
Good to hear you
Speech to Text: how are you

** The pass statement is a null operation; nothing happens when it executes.

** ambient noise level (sometimes called background noise level, reference sound level, or room noise level)

Now we are ready to explore speech recognition further.

Speech Recognition

Written by Mona