A comprehensive Python library providing clean, educational implementations of fundamental speech processing algorithms
Project description
SpeechAlgo
A comprehensive Python library providing clean, educational implementations of fundamental speech processing algorithms.
Overview
SpeechAlgo provides reference implementations of 20 core speech processing algorithms, organized into five categories: preprocessing, voice activity detection (VAD), pitch detection, speech enhancement, and feature extraction. The library is designed for both educational purposes and production use, with clear code, comprehensive documentation, and mathematical foundations.
Features
- Clean, readable implementations suitable for learning and research
- Comprehensive documentation with mathematical foundations and references
- Type-annotated code with extensive docstrings
- NumPy-based implementations for efficiency
- Modular design with consistent APIs
- Well-tested algorithms with unit tests
- Real-time capable for many algorithms
Installation
From PyPI (when published)
pip install speechalgo
From source
git clone https://github.com/tarun7r/SpeechAlgo.git
cd SpeechAlgo
pip install -e .
Development installation
pip install -e ".[dev]"
Quick Start
Preprocessing
import numpy as np
from speechalgo.preprocessing import hamming_window, FrameExtractor, MFCC
from speechalgo.utils import load_audio
# Load audio
audio, sr = load_audio('speech.wav', sample_rate=16000)
# Apply window function
window = hamming_window(512)
windowed = audio[:512] * window
# Extract overlapping frames
frame_extractor = FrameExtractor(frame_length=512, hop_length=256)
frames = frame_extractor.extract_frames(audio)
print(f"Extracted {len(frames)} frames")
# Extract MFCC features
mfcc = MFCC(sample_rate=16000, n_mfcc=13)
mfcc_features = mfcc.process(audio)
print(f"MFCC shape: {mfcc_features.shape}") # (13, n_frames)
Voice Activity Detection
from speechalgo.vad import EnergyBasedVAD, SpectralEntropyVAD, ZeroCrossingVAD
# Energy-based VAD
energy_vad = EnergyBasedVAD(sample_rate=16000)
is_speech = energy_vad.process(audio)
print(f"Speech detected in {is_speech.sum()} / {len(is_speech)} frames")
# Spectral entropy VAD (more robust in noise)
entropy_vad = SpectralEntropyVAD(sample_rate=16000)
is_speech = entropy_vad.process(audio)
# Zero-crossing rate VAD
zcr_vad = ZeroCrossingVAD(sample_rate=16000)
is_speech = zcr_vad.process(audio)
# Extract speech segments
speech_frames = np.where(is_speech)[0]
if len(speech_frames) > 0:
start = speech_frames[0] * 256 # hop_length
end = speech_frames[-1] * 256
speech_segment = audio[start:end]
Pitch Detection
from speechalgo.pitch import YIN, Autocorrelation, HPS
# YIN algorithm (recommended for speech)
yin = YIN(sample_rate=16000)
pitch = yin.estimate(audio_frame)
print(f"Estimated pitch: {pitch:.1f} Hz")
# Autocorrelation method
autocorr = Autocorrelation(sample_rate=16000)
pitch = autocorr.estimate(audio_frame)
# Harmonic Product Spectrum
hps = HPS(sample_rate=16000)
pitch = hps.estimate(audio_frame)
Speech Enhancement
from speechalgo.enhancement import SpectralSubtraction, WienerFilter, NoiseGate
# Spectral subtraction
enhancer = SpectralSubtraction(sample_rate=16000)
clean_audio = enhancer.process(noisy_audio, noise_profile)
# Wiener filtering
wiener = WienerFilter(sample_rate=16000)
clean_audio = wiener.process(noisy_audio)
# Noise gate
gate = NoiseGate(threshold=-40.0, sample_rate=16000)
gated_audio = gate.process(audio)
Feature Extraction
from speechalgo.features import SpectralFeatures, TemporalFeatures, DeltaFeatures
# Spectral features
spectral = SpectralFeatures(sample_rate=16000)
centroid = spectral.spectral_centroid(audio)
rolloff = spectral.spectral_rolloff(audio)
flux = spectral.spectral_flux(audio)
# Temporal features
temporal = TemporalFeatures(sample_rate=16000)
zcr = temporal.zero_crossing_rate(audio)
energy = temporal.short_time_energy(audio)
# Delta features (velocity and acceleration)
delta = DeltaFeatures()
mfcc_delta = delta.compute_delta(mfcc_features)
mfcc_delta2 = delta.compute_delta(mfcc_delta)
Algorithm Categories
Preprocessing (5 algorithms)
- Windowing - Hamming, Hanning, Blackman window functions for spectral analysis
- Framing - Overlapping frame extraction with configurable hop length
- Pre-emphasis - High-frequency boosting filter (α=0.97)
- MFCC - Mel-Frequency Cepstral Coefficients extraction
- Mel-Spectrogram - Mel-scale frequency representation
Voice Activity Detection (3 algorithms)
- Energy-based VAD - Simple threshold-based detection using short-time energy
- Spectral Entropy VAD - Entropy-based voice detection (robust in noise)
- Zero-Crossing VAD - Combined energy and zero-crossing rate approach
Pitch Detection (4 algorithms)
- Autocorrelation - Classic time-domain pitch estimation
- YIN Algorithm - Improved autocorrelation with difference function
- Cepstral Method - Pitch detection using cepstrum
- Harmonic Product Spectrum (HPS) - Frequency-domain approach
Speech Enhancement (3 algorithms)
- Spectral Subtraction - Classic noise reduction technique
- Wiener Filtering - Statistical optimal filtering
- Noise Gate - Threshold-based noise suppression
Feature Extraction (4 algorithms)
- Spectral Features - Centroid, rolloff, flux, bandwidth
- Zero Crossing Rate - Time-domain feature extraction
- Short-Time Energy - Energy computation in frames
- Delta Features - First and second-order temporal derivatives
Documentation
Comprehensive documentation with mathematical foundations and implementation notes:
- Getting Started Tutorial - Installation and first steps
- Preprocessing Theory - Windowing, framing, MFCC, mel-spectrogram
- VAD Theory - Voice activity detection methods and comparisons
- Pitch Detection Theory - Pitch estimation algorithms and best practices
- Speech Enhancement Theory - Noise reduction techniques
Examples
See Getting Started Tutorial for comprehensive examples including:
- Basic Examples: Window functions, framing, pre-emphasis, MFCC, VAD, pitch detection, enhancement
- Complete Workflows:
- Feature extraction pipeline (MFCC + delta + delta-delta = 39 dimensions)
- VAD + pitch tracking for voiced speech analysis
- Multi-stage noise reduction (noise gate → Wiener filter)
- Multi-algorithm comparison and benchmarking
Requirements
- Python 3.8+
- NumPy >= 1.20.0
- SciPy >= 1.7.0
- SoundFile >= 0.11.0
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please ensure:
- Code passes all tests
- New features include tests
- Documentation is updated
- Code follows PEP 8 style guide
- Type hints are included
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
This library implements algorithms from foundational research in speech processing:
- Davis & Mermelstein (1980) - MFCC
- de Cheveigné & Kawahara (2002) - YIN algorithm
- Rabiner & Schafer (1978) - Digital speech processing fundamentals
- Ephraim & Malah (1984) - Spectral subtraction
- And many others cited in individual algorithm documentation
Project Status
Current Version: 0.1.0 (MVP Complete)
✅ All 20 core algorithms implemented and tested!
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speechalgo-0.1.0.tar.gz.
File metadata
- Download URL: speechalgo-0.1.0.tar.gz
- Upload date:
- Size: 51.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fa493dad939612faf09f9093e6b0731a50815dbe31fd5994ee3f0e146d83466
|
|
| MD5 |
c961e16c3da31ba87a1e1f47845fb0dd
|
|
| BLAKE2b-256 |
aa2944c5fd1d8de2a80a271935f20babf59d6eb1af4b2023cb13d3d22bcd30e8
|
Provenance
The following attestation bundles were made for speechalgo-0.1.0.tar.gz:
Publisher:
publish.yml on tarun7r/SpeechAlgo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
speechalgo-0.1.0.tar.gz -
Subject digest:
5fa493dad939612faf09f9093e6b0731a50815dbe31fd5994ee3f0e146d83466 - Sigstore transparency entry: 640430451
- Sigstore integration time:
-
Permalink:
tarun7r/SpeechAlgo@2f42fd9cfa52fefec9ef31130e290c49faa7fd00 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tarun7r
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2f42fd9cfa52fefec9ef31130e290c49faa7fd00 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file speechalgo-0.1.0-py3-none-any.whl.
File metadata
- Download URL: speechalgo-0.1.0-py3-none-any.whl
- Upload date:
- Size: 69.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dfa50d145f3316b41b3f053f5d6b25264bc4e2fce6c599584a119ad869b769ce
|
|
| MD5 |
69ffecea915d1f6c324f9c2a9e4845b4
|
|
| BLAKE2b-256 |
e44ecaf47d34e0b57c956ee750dbbe1b47fa15419b6f3a3068ce2abf0280c77d
|
Provenance
The following attestation bundles were made for speechalgo-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on tarun7r/SpeechAlgo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
speechalgo-0.1.0-py3-none-any.whl -
Subject digest:
dfa50d145f3316b41b3f053f5d6b25264bc4e2fce6c599584a119ad869b769ce - Sigstore transparency entry: 640430461
- Sigstore integration time:
-
Permalink:
tarun7r/SpeechAlgo@2f42fd9cfa52fefec9ef31130e290c49faa7fd00 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tarun7r
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2f42fd9cfa52fefec9ef31130e290c49faa7fd00 -
Trigger Event:
workflow_dispatch
-
Statement type: