Speech Recognition in Python

Loading

Here’s a comprehensive guide on Speech Recognition in Python, covering each step in detail.


Speech Recognition in Python

Speech recognition, also known as automatic speech recognition (ASR), is the process of converting spoken language into text. In Python, we can achieve this using various libraries such as SpeechRecognition, CMU Sphinx, Google Speech API, and Deep Learning-based models.


1. Installing Required Libraries

Before we start, we need to install some essential libraries:

pip install SpeechRecognition
pip install pyaudio
pip install numpy
pip install pocketsphinx
  • SpeechRecognition: A popular library for handling speech recognition.
  • PyAudio: Allows access to the microphone.
  • NumPy: Required for processing audio data.
  • PocketSphinx: An offline speech recognition engine.

2. Understanding Speech Recognition Libraries in Python

There are multiple engines available for speech recognition:

  1. Google Web Speech API – High accuracy but requires an internet connection.
  2. CMU Sphinx (Pocketsphinx) – Works offline but has lower accuracy.
  3. IBM Speech to Text – Requires IBM Watson API credentials.
  4. Microsoft Azure Speech – Cloud-based API with good accuracy.
  5. Amazon Transcribe – AWS-based speech recognition service.
  6. Deep Learning models – Custom models trained using TensorFlow, PyTorch.

3. Implementing Speech Recognition Using SpeechRecognition Library

The SpeechRecognition library provides an easy interface to recognize speech from a microphone or an audio file.

Step 1: Importing Required Libraries

import speech_recognition as sr

Step 2: Initializing the Recognizer

recognizer = sr.Recognizer()

Step 3: Capturing Audio from Microphone

with sr.Microphone() as source:
    print("Listening...")
    recognizer.adjust_for_ambient_noise(source)  # Adjusting for background noise
    audio = recognizer.listen(source)  # Capturing audio

Step 4: Recognizing Speech

try:
    text = recognizer.recognize_google(audio)
    print("You said:", text)
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError:
    print("API unavailable")

Explanation:

  • recognizer.adjust_for_ambient_noise(source): Reduces background noise.
  • recognizer.listen(source): Captures audio from the microphone.
  • recognizer.recognize_google(audio): Uses Google Speech API to convert speech to text.

4. Speech Recognition from an Audio File

We can also recognize speech from a pre-recorded WAV file.

Step 1: Load the Audio File

audio_file = "sample.wav"

with sr.AudioFile(audio_file) as source:
    recognizer.adjust_for_ambient_noise(source)
    audio = recognizer.record(source)

Step 2: Recognize Speech from the Audio File

try:
    text = recognizer.recognize_google(audio)
    print("Transcription:", text)
except sr.UnknownValueError:
    print("Could not understand the audio")
except sr.RequestError:
    print("Could not request results from the service")

5. Using Different Speech Recognition Engines

The recognize_google method can be replaced with other services:

recognizer.recognize_sphinx(audio)  # Offline recognition using CMU Sphinx
recognizer.recognize_ibm(audio, username="API_KEY", password="API_SECRET")
recognizer.recognize_azure(audio, key="AZURE_KEY", region="REGION")
recognizer.recognize_amazon(audio, key="AWS_KEY", secret="AWS_SECRET")

6. Handling Noisy Environments

In a noisy environment, background noise can interfere with recognition. To handle this:

recognizer.energy_threshold = 4000  # Adjust the threshold
recognizer.dynamic_energy_threshold = True
recognizer.pause_threshold = 0.8  # Wait time before speech recognition starts

7. Using Deep Learning Models for Speech Recognition

For advanced applications, deep learning-based models like DeepSpeech, Wav2Vec2, and Whisper AI can be used.

Using DeepSpeech (Mozilla)

pip install deepspeech
import deepspeech
model_file = "deepspeech-0.9.3-models.pbmm"
model = deepspeech.Model(model_file)

audio_file = "sample.wav"
# Convert audio to text
transcription = model.stt(audio_file)
print("Transcription:", transcription)

Using OpenAI Whisper

pip install openai-whisper
import whisper

model = whisper.load_model("base")
result = model.transcribe("sample.wav")
print(result["text"])

8. Building a Simple Voice Assistant

We can integrate speech recognition into a voice assistant.

import speech_recognition as sr
import pyttsx3

recognizer = sr.Recognizer()
engine = pyttsx3.init()

with sr.Microphone() as source:
    print("Speak something...")
    audio = recognizer.listen(source)

try:
    text = recognizer.recognize_google(audio)
    print("You said:", text)
    
    engine.say(f"You said: {text}")
    engine.runAndWait()
except:
    print("Sorry, I couldn't understand.")

9. Converting Text to Speech

We can use the pyttsx3 library for text-to-speech conversion.

import pyttsx3

engine = pyttsx3.init()
engine.say("Hello! How are you?")
engine.runAndWait()

10. Applications of Speech Recognition

Speech recognition is widely used in:

  • Voice Assistants (Google Assistant, Siri, Alexa)
  • Transcription Services (Google Docs Voice Typing)
  • Automated Customer Support (Chatbots, IVR systems)
  • Accessibility for Disabled Users (Speech-to-text for hearing-impaired individuals)
  • Command Control Applications (Smart Home, IoT)

Leave a Reply

Your email address will not be published. Required fields are marked *