Skip to main content

A comprehensive Python library providing clean, educational implementations of fundamental speech processing algorithms

Project description

SpeechAlgo

A comprehensive Python library providing clean, educational implementations of fundamental speech processing algorithms.

Overview

SpeechAlgo provides reference implementations of 20 core speech processing algorithms, organized into five categories: preprocessing, voice activity detection (VAD), pitch detection, speech enhancement, and feature extraction. The library is designed for both educational purposes and production use, with clear code, comprehensive documentation, and mathematical foundations.

Features

  • Clean, readable implementations suitable for learning and research
  • Comprehensive documentation with mathematical foundations and references
  • Type-annotated code with extensive docstrings
  • NumPy-based implementations for efficiency
  • Modular design with consistent APIs
  • Well-tested algorithms with unit tests
  • Real-time capable for many algorithms

Installation

From PyPI (when published)

pip install speechalgo

From source

git clone https://github.com/tarun7r/SpeechAlgo.git
cd SpeechAlgo
pip install -e .

Development installation

pip install -e ".[dev]"

Quick Start

Preprocessing

import numpy as np
from speechalgo.preprocessing import hamming_window, FrameExtractor, MFCC
from speechalgo.utils import load_audio

# Load audio
audio, sr = load_audio('speech.wav', sample_rate=16000)

# Apply window function
window = hamming_window(512)
windowed = audio[:512] * window

# Extract overlapping frames
frame_extractor = FrameExtractor(frame_length=512, hop_length=256)
frames = frame_extractor.extract_frames(audio)
print(f"Extracted {len(frames)} frames")

# Extract MFCC features
mfcc = MFCC(sample_rate=16000, n_mfcc=13)
mfcc_features = mfcc.process(audio)
print(f"MFCC shape: {mfcc_features.shape}")  # (13, n_frames)

Voice Activity Detection

from speechalgo.vad import EnergyBasedVAD, SpectralEntropyVAD, ZeroCrossingVAD

# Energy-based VAD
energy_vad = EnergyBasedVAD(sample_rate=16000)
is_speech = energy_vad.process(audio)
print(f"Speech detected in {is_speech.sum()} / {len(is_speech)} frames")

# Spectral entropy VAD (more robust in noise)
entropy_vad = SpectralEntropyVAD(sample_rate=16000)
is_speech = entropy_vad.process(audio)

# Zero-crossing rate VAD
zcr_vad = ZeroCrossingVAD(sample_rate=16000)
is_speech = zcr_vad.process(audio)

# Extract speech segments
speech_frames = np.where(is_speech)[0]
if len(speech_frames) > 0:
    start = speech_frames[0] * 256  # hop_length
    end = speech_frames[-1] * 256
    speech_segment = audio[start:end]

Pitch Detection

from speechalgo.pitch import YIN, Autocorrelation, HPS

# YIN algorithm (recommended for speech)
yin = YIN(sample_rate=16000)
pitch = yin.estimate(audio_frame)
print(f"Estimated pitch: {pitch:.1f} Hz")

# Autocorrelation method
autocorr = Autocorrelation(sample_rate=16000)
pitch = autocorr.estimate(audio_frame)

# Harmonic Product Spectrum
hps = HPS(sample_rate=16000)
pitch = hps.estimate(audio_frame)

Speech Enhancement

from speechalgo.enhancement import SpectralSubtraction, WienerFilter, NoiseGate

# Spectral subtraction
enhancer = SpectralSubtraction(sample_rate=16000)
clean_audio = enhancer.process(noisy_audio, noise_profile)

# Wiener filtering
wiener = WienerFilter(sample_rate=16000)
clean_audio = wiener.process(noisy_audio)

# Noise gate
gate = NoiseGate(threshold=-40.0, sample_rate=16000)
gated_audio = gate.process(audio)

Feature Extraction

from speechalgo.features import SpectralFeatures, TemporalFeatures, DeltaFeatures

# Spectral features
spectral = SpectralFeatures(sample_rate=16000)
centroid = spectral.spectral_centroid(audio)
rolloff = spectral.spectral_rolloff(audio)
flux = spectral.spectral_flux(audio)

# Temporal features
temporal = TemporalFeatures(sample_rate=16000)
zcr = temporal.zero_crossing_rate(audio)
energy = temporal.short_time_energy(audio)

# Delta features (velocity and acceleration)
delta = DeltaFeatures()
mfcc_delta = delta.compute_delta(mfcc_features)
mfcc_delta2 = delta.compute_delta(mfcc_delta)

Algorithm Categories

Preprocessing (5 algorithms)

  1. Windowing - Hamming, Hanning, Blackman window functions for spectral analysis
  2. Framing - Overlapping frame extraction with configurable hop length
  3. Pre-emphasis - High-frequency boosting filter (α=0.97)
  4. MFCC - Mel-Frequency Cepstral Coefficients extraction
  5. Mel-Spectrogram - Mel-scale frequency representation

Voice Activity Detection (3 algorithms)

  1. Energy-based VAD - Simple threshold-based detection using short-time energy
  2. Spectral Entropy VAD - Entropy-based voice detection (robust in noise)
  3. Zero-Crossing VAD - Combined energy and zero-crossing rate approach

Pitch Detection (4 algorithms)

  1. Autocorrelation - Classic time-domain pitch estimation
  2. YIN Algorithm - Improved autocorrelation with difference function
  3. Cepstral Method - Pitch detection using cepstrum
  4. Harmonic Product Spectrum (HPS) - Frequency-domain approach

Speech Enhancement (3 algorithms)

  1. Spectral Subtraction - Classic noise reduction technique
  2. Wiener Filtering - Statistical optimal filtering
  3. Noise Gate - Threshold-based noise suppression

Feature Extraction (4 algorithms)

  1. Spectral Features - Centroid, rolloff, flux, bandwidth
  2. Zero Crossing Rate - Time-domain feature extraction
  3. Short-Time Energy - Energy computation in frames
  4. Delta Features - First and second-order temporal derivatives

Documentation

Comprehensive documentation with mathematical foundations and implementation notes:

Examples

See Getting Started Tutorial for comprehensive examples including:

  • Basic Examples: Window functions, framing, pre-emphasis, MFCC, VAD, pitch detection, enhancement
  • Complete Workflows:
    • Feature extraction pipeline (MFCC + delta + delta-delta = 39 dimensions)
    • VAD + pitch tracking for voiced speech analysis
    • Multi-stage noise reduction (noise gate → Wiener filter)
    • Multi-algorithm comparison and benchmarking

Requirements

  • Python 3.8+
  • NumPy >= 1.20.0
  • SciPy >= 1.7.0
  • SoundFile >= 0.11.0

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please ensure:

  • Code passes all tests
  • New features include tests
  • Documentation is updated
  • Code follows PEP 8 style guide
  • Type hints are included

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This library implements algorithms from foundational research in speech processing:

  • Davis & Mermelstein (1980) - MFCC
  • de Cheveigné & Kawahara (2002) - YIN algorithm
  • Rabiner & Schafer (1978) - Digital speech processing fundamentals
  • Ephraim & Malah (1984) - Spectral subtraction
  • And many others cited in individual algorithm documentation

Project Status

Current Version: 0.1.0 (MVP Complete)

All 20 core algorithms implemented and tested!

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechalgo-0.1.0.tar.gz (51.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechalgo-0.1.0-py3-none-any.whl (69.9 kB view details)

Uploaded Python 3

File details

Details for the file speechalgo-0.1.0.tar.gz.

File metadata

  • Download URL: speechalgo-0.1.0.tar.gz
  • Upload date:
  • Size: 51.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechalgo-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5fa493dad939612faf09f9093e6b0731a50815dbe31fd5994ee3f0e146d83466
MD5 c961e16c3da31ba87a1e1f47845fb0dd
BLAKE2b-256 aa2944c5fd1d8de2a80a271935f20babf59d6eb1af4b2023cb13d3d22bcd30e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for speechalgo-0.1.0.tar.gz:

Publisher: publish.yml on tarun7r/SpeechAlgo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file speechalgo-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: speechalgo-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 69.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechalgo-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dfa50d145f3316b41b3f053f5d6b25264bc4e2fce6c599584a119ad869b769ce
MD5 69ffecea915d1f6c324f9c2a9e4845b4
BLAKE2b-256 e44ecaf47d34e0b57c956ee750dbbe1b47fa15419b6f3a3068ce2abf0280c77d

See more details on using hashes here.

Provenance

The following attestation bundles were made for speechalgo-0.1.0-py3-none-any.whl:

Publisher: publish.yml on tarun7r/SpeechAlgo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page