A comprehensive Python library providing clean, educational implementations of fundamental speech processing algorithms

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

SpeechAlgo

A comprehensive Python library providing clean, educational implementations of fundamental speech processing algorithms.

Overview

SpeechAlgo provides reference implementations of 20 core speech processing algorithms, organized into five categories: preprocessing, voice activity detection (VAD), pitch detection, speech enhancement, and feature extraction. The library is designed for both educational purposes and production use, with clear code, comprehensive documentation, and mathematical foundations.

Features

Clean, readable implementations suitable for learning and research
Comprehensive documentation with mathematical foundations and references
Type-annotated code with extensive docstrings
NumPy-based implementations for efficiency
Modular design with consistent APIs
Well-tested algorithms with unit tests
Real-time capable for many algorithms

Installation

From PyPI (when published)

pip install speechalgo

From source

git clone https://github.com/tarun7r/SpeechAlgo.git
cd SpeechAlgo
pip install -e .

Development installation

pip install -e ".[dev]"

Quick Start

Preprocessing

import numpy as np
from speechalgo.preprocessing import hamming_window, FrameExtractor, MFCC
from speechalgo.utils import load_audio

# Load audio
audio, sr = load_audio('speech.wav', sample_rate=16000)

# Apply window function
window = hamming_window(512)
windowed = audio[:512] * window

# Extract overlapping frames
frame_extractor = FrameExtractor(frame_length=512, hop_length=256)
frames = frame_extractor.extract_frames(audio)
print(f"Extracted {len(frames)} frames")

# Extract MFCC features
mfcc = MFCC(sample_rate=16000, n_mfcc=13)
mfcc_features = mfcc.process(audio)
print(f"MFCC shape: {mfcc_features.shape}")  # (13, n_frames)

Voice Activity Detection

from speechalgo.vad import EnergyBasedVAD, SpectralEntropyVAD, ZeroCrossingVAD

# Energy-based VAD
energy_vad = EnergyBasedVAD(sample_rate=16000)
is_speech = energy_vad.process(audio)
print(f"Speech detected in {is_speech.sum()} / {len(is_speech)} frames")

# Spectral entropy VAD (more robust in noise)
entropy_vad = SpectralEntropyVAD(sample_rate=16000)
is_speech = entropy_vad.process(audio)

# Zero-crossing rate VAD
zcr_vad = ZeroCrossingVAD(sample_rate=16000)
is_speech = zcr_vad.process(audio)

# Extract speech segments
speech_frames = np.where(is_speech)[0]
if len(speech_frames) > 0:
    start = speech_frames[0] * 256  # hop_length
    end = speech_frames[-1] * 256
    speech_segment = audio[start:end]

Pitch Detection

from speechalgo.pitch import YIN, Autocorrelation, HPS

# YIN algorithm (recommended for speech)
yin = YIN(sample_rate=16000)
pitch = yin.estimate(audio_frame)
print(f"Estimated pitch: {pitch:.1f} Hz")

# Autocorrelation method
autocorr = Autocorrelation(sample_rate=16000)
pitch = autocorr.estimate(audio_frame)

# Harmonic Product Spectrum
hps = HPS(sample_rate=16000)
pitch = hps.estimate(audio_frame)

Speech Enhancement

from speechalgo.enhancement import SpectralSubtraction, WienerFilter, NoiseGate

# Spectral subtraction
enhancer = SpectralSubtraction(sample_rate=16000)
clean_audio = enhancer.process(noisy_audio, noise_profile)

# Wiener filtering
wiener = WienerFilter(sample_rate=16000)
clean_audio = wiener.process(noisy_audio)

# Noise gate
gate = NoiseGate(threshold=-40.0, sample_rate=16000)
gated_audio = gate.process(audio)

Feature Extraction

from speechalgo.features import SpectralFeatures, TemporalFeatures, DeltaFeatures

# Spectral features
spectral = SpectralFeatures(sample_rate=16000)
centroid = spectral.spectral_centroid(audio)
rolloff = spectral.spectral_rolloff(audio)
flux = spectral.spectral_flux(audio)

# Temporal features
temporal = TemporalFeatures(sample_rate=16000)
zcr = temporal.zero_crossing_rate(audio)
energy = temporal.short_time_energy(audio)

# Delta features (velocity and acceleration)
delta = DeltaFeatures()
mfcc_delta = delta.compute_delta(mfcc_features)
mfcc_delta2 = delta.compute_delta(mfcc_delta)

Algorithm Categories

Preprocessing (5 algorithms)

Windowing - Hamming, Hanning, Blackman window functions for spectral analysis
Framing - Overlapping frame extraction with configurable hop length
Pre-emphasis - High-frequency boosting filter (α=0.97)
MFCC - Mel-Frequency Cepstral Coefficients extraction
Mel-Spectrogram - Mel-scale frequency representation

Voice Activity Detection (3 algorithms)

Energy-based VAD - Simple threshold-based detection using short-time energy
Spectral Entropy VAD - Entropy-based voice detection (robust in noise)
Zero-Crossing VAD - Combined energy and zero-crossing rate approach

Pitch Detection (4 algorithms)

Autocorrelation - Classic time-domain pitch estimation
YIN Algorithm - Improved autocorrelation with difference function
Cepstral Method - Pitch detection using cepstrum
Harmonic Product Spectrum (HPS) - Frequency-domain approach

Speech Enhancement (3 algorithms)

Spectral Subtraction - Classic noise reduction technique
Wiener Filtering - Statistical optimal filtering
Noise Gate - Threshold-based noise suppression

Feature Extraction (4 algorithms)

Spectral Features - Centroid, rolloff, flux, bandwidth
Zero Crossing Rate - Time-domain feature extraction
Short-Time Energy - Energy computation in frames
Delta Features - First and second-order temporal derivatives

Documentation

Comprehensive documentation with mathematical foundations and implementation notes:

Getting Started Tutorial - Installation and first steps
Preprocessing Theory - Windowing, framing, MFCC, mel-spectrogram
VAD Theory - Voice activity detection methods and comparisons
Pitch Detection Theory - Pitch estimation algorithms and best practices
Speech Enhancement Theory - Noise reduction techniques

Examples

See Getting Started Tutorial for comprehensive examples including:

Basic Examples: Window functions, framing, pre-emphasis, MFCC, VAD, pitch detection, enhancement
Complete Workflows:
- Feature extraction pipeline (MFCC + delta + delta-delta = 39 dimensions)
- VAD + pitch tracking for voiced speech analysis
- Multi-stage noise reduction (noise gate → Wiener filter)
- Multi-algorithm comparison and benchmarking

Requirements

Python 3.8+
NumPy >= 1.20.0
SciPy >= 1.7.0
SoundFile >= 0.11.0

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please ensure:

Code passes all tests
New features include tests
Documentation is updated
Code follows PEP 8 style guide
Type hints are included

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This library implements algorithms from foundational research in speech processing:

Davis & Mermelstein (1980) - MFCC
de Cheveigné & Kawahara (2002) - YIN algorithm
Rabiner & Schafer (1978) - Digital speech processing fundamentals
Ephraim & Malah (1984) - Spectral subtraction
And many others cited in individual algorithm documentation

Project Status

Current Version: 0.1.0 (MVP Complete)

✅ All 20 core algorithms implemented and tested!

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tarun7r

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Oct 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechalgo-0.1.0.tar.gz (51.9 kB view details)

Uploaded Oct 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speechalgo-0.1.0-py3-none-any.whl (69.9 kB view details)

Uploaded Oct 25, 2025 Python 3

File details

Details for the file speechalgo-0.1.0.tar.gz.

File metadata

Download URL: speechalgo-0.1.0.tar.gz
Upload date: Oct 25, 2025
Size: 51.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechalgo-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5fa493dad939612faf09f9093e6b0731a50815dbe31fd5994ee3f0e146d83466`
MD5	`c961e16c3da31ba87a1e1f47845fb0dd`
BLAKE2b-256	`aa2944c5fd1d8de2a80a271935f20babf59d6eb1af4b2023cb13d3d22bcd30e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for speechalgo-0.1.0.tar.gz:

Publisher: publish.yml on tarun7r/SpeechAlgo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: speechalgo-0.1.0.tar.gz
- Subject digest: 5fa493dad939612faf09f9093e6b0731a50815dbe31fd5994ee3f0e146d83466
- Sigstore transparency entry: 640430451
- Sigstore integration time: Oct 25, 2025
Source repository:
- Permalink: tarun7r/SpeechAlgo@2f42fd9cfa52fefec9ef31130e290c49faa7fd00
- Branch / Tag: refs/heads/main
- Owner: https://github.com/tarun7r
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2f42fd9cfa52fefec9ef31130e290c49faa7fd00
- Trigger Event: workflow_dispatch

File details

Details for the file speechalgo-0.1.0-py3-none-any.whl.

File metadata

Download URL: speechalgo-0.1.0-py3-none-any.whl
Upload date: Oct 25, 2025
Size: 69.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for speechalgo-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dfa50d145f3316b41b3f053f5d6b25264bc4e2fce6c599584a119ad869b769ce`
MD5	`69ffecea915d1f6c324f9c2a9e4845b4`
BLAKE2b-256	`e44ecaf47d34e0b57c956ee750dbbe1b47fa15419b6f3a3068ce2abf0280c77d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for speechalgo-0.1.0-py3-none-any.whl:

Publisher: publish.yml on tarun7r/SpeechAlgo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: speechalgo-0.1.0-py3-none-any.whl
- Subject digest: dfa50d145f3316b41b3f053f5d6b25264bc4e2fce6c599584a119ad869b769ce
- Sigstore transparency entry: 640430461
- Sigstore integration time: Oct 25, 2025
Source repository:
- Permalink: tarun7r/SpeechAlgo@2f42fd9cfa52fefec9ef31130e290c49faa7fd00
- Branch / Tag: refs/heads/main
- Owner: https://github.com/tarun7r
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2f42fd9cfa52fefec9ef31130e290c49faa7fd00
- Trigger Event: workflow_dispatch

speechalgo 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SpeechAlgo

Overview

Features

Installation

From PyPI (when published)

From source

Development installation

Quick Start

Preprocessing

Voice Activity Detection

Pitch Detection

Speech Enhancement

Feature Extraction

Algorithm Categories

Preprocessing (5 algorithms)

Voice Activity Detection (3 algorithms)

Pitch Detection (4 algorithms)

Speech Enhancement (3 algorithms)

Feature Extraction (4 algorithms)

Documentation

Examples

Requirements

Contributing

License

Acknowledgments

Project Status

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance