[DEPRECATED - Use 'cargo install voxtus' instead] Transcribe Internet videos and media files to text
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
🗣️ Voxtus (Python - Deprecated)
⚠️ DEPRECATED: This Python version is no longer maintained. Please use the new Rust implementation: github.com/johanthoren/voxtus
Install the new version with:
cargo install voxtus
Voxtus is a command-line tool for transcribing Internet videos and media files to text using faster-whisper.
It supports multiple output formats and can download, transcribe, and optionally retain the original audio. It's built in Python and installable as a proper CLI via PyPI or from source.
✨ Features
- 🎥 Download & transcribe videos from YouTube, Vimeo, and 1000+ sites
- 📁 Local file support for audio/video files
- 📝 Multiple output formats: TXT, JSON, SRT, VTT
- 🎛️ Model selection - Choose from tiny to large models for speed/accuracy trade-offs
- 🔄 Batch processing multiple formats in one run
- 📊 Rich metadata in JSON format (title, source, duration, language)
- 🚀 Stdout mode for pipeline integration
- 🎯 LLM-friendly default text format
- ⚡ Fast transcription via faster-whisper
⚙️ Installation
1. Install system dependency: ffmpeg
Voxtus uses ffmpeg under the hood to extract audio from video files.
macOS:
brew install ffmpeg
Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
2. Recommended for end users (via pipx)
pipx install voxtus
After that, simply run:
voxtus --help
🧪 Development Setup
Quick Start for Contributors
git clone https://github.com/johanthoren/voxtus.git
cd voxtus
# Install uv (fast Python package manager)
brew install uv # macOS
# or: pip install uv # any platform
# Setup development environment
make dev-install
# Run tests
make test
Development Workflow
The project uses a simple Makefile for development tasks. All targets automatically verify dependencies and provide helpful installation instructions if tools are missing.
make help # Show all available commands with dynamic version examples
make install # Install package and dependencies
make dev-install # Install with development dependencies
make run # Run development version (e.g., make run -- -f json file.mp4)
make test # Run tests (fast)
make test-coverage # Run tests with coverage report
make test-ci # Run GitHub Actions workflow locally (requires act)
# Dependency verification
make verify-uv # Check if uv is installed
make verify-act # Check if act is installed
# Release (bumps version, commits, tags, and pushes)
make release # Patch release (e.g., 0.1.9 -> 0.1.10)
make release patch # Patch release (same as above)
make release minor # Minor release (e.g., 0.1.9 -> 0.2.0)
make release major # Major release (e.g., 0.1.9 -> 1.0.0)
Dependencies
The Makefile automatically checks for required tools:
- uv - Fast Python package manager (required for most targets)
- Install:
curl -LsSf https://astral.sh/uv/install.sh | shorbrew install uv
- Install:
- act - Run GitHub Actions locally (optional, only for
test-ci)- Install:
brew install actor see installation guide
- Install:
Enhanced Release Process
The release process includes comprehensive safety checks:
- Git Status Check - Offers to stage and commit pending changes
- Test Suite - Runs tests with coverage reporting
- Coverage Validation - Prompts if coverage is below 80%
- Version Bump - Updates
pyproject.tomland commits the change - Git Operations - Creates tag and pushes to trigger CI/CD
Local CI Testing
Use make test-ci to run the exact same GitHub Actions workflow locally:
make test-ci # Runs .github/workflows/test.yml with act
This ensures your changes work in the CI environment before pushing.
🧪 For contributors / running from source
git clone https://github.com/johanthoren/voxtus.git
cd voxtus
brew install uv # or: pip install uv
uv venv
source .venv/bin/activate
uv pip install .
Then run:
voxtus --help
📋 Output Formats
| Format | Description | Use Case |
|---|---|---|
| TXT | Plain text with timestamps | Default, LLM processing, reading |
| JSON | Structured data with metadata | APIs, data analysis, archival |
| SRT | SubRip subtitle format | Video subtitles, media players |
| VTT | WebVTT subtitle format | Web browsers, HTML5 video |
Additional formats (CSV) are planned for future releases.
🔧 Extensible Format System
Voxtus uses a modular format system that makes adding new output formats straightforward. Each format is implemented as a separate module with its own writer class, making the codebase maintainable and extensible.
🎛️ Model Selection
Voxtus supports multiple Whisper models with different trade-offs between speed, accuracy, and resource usage:
Available Models
| Model | Parameters | VRAM | Languages | Best For |
|---|---|---|---|---|
| tiny | 39M | ~1GB | Multilingual | Fastest inference, low resources |
| tiny.en | 39M | ~1GB | English only | Fastest English-only transcription |
| base | 74M | ~1GB | Multilingual | Good balance for minimal resources |
| base.en | 74M | ~1GB | English only | Balanced English-only |
| small | 244M | ~2GB | Multilingual | Default balance |
| small.en | 244M | ~2GB | English only | Higher accuracy English |
| medium | 769M | ~5GB | Multilingual | Good accuracy, slower |
| medium.en | 769M | ~5GB | English only | Good accuracy English |
| distil-large-v3 | 756M | ~6GB | Multilingual | Faster with good accuracy |
| large | 1550M | ~10GB | Multilingual | Highest accuracy |
| large-v2 | 1550M | ~10GB | Multilingual | Improved large model |
| large-v3 | 1550M | ~10GB | Multilingual | Latest large model |
| turbo | 809M | ~6GB | Multilingual | Optimized for speed |
VRAM requirements are from OpenAI's official specifications. Actual performance varies by hardware and audio content.
Model Selection Guide
# List all available models with characteristics
voxtus --list-models
# Speed-optimized (fastest)
voxtus --model tiny video.mp4
# Balanced (default)
voxtus --model small video.mp4
# Better accuracy with speed
voxtus --model distil-large-v3 video.mp4
# Quality-optimized (most accurate)
voxtus --model large-v3 video.mp4
# English-only (faster for English content)
voxtus --model small.en video.mp4
💡 Tip: English-only models (.en) are faster and more accurate for English content, while multilingual models work with 99+ languages.
🧪 Examples
Basic Usage
# Transcribe to default TXT format
voxtus https://www.youtube.com/watch?v=abc123
# Transcribe local file
voxtus recording.mp3
Format Selection
# Single format
voxtus -f json video.mp4
# Multiple formats at once
voxtus -f txt,json,srt,vtt video.mp4
# SRT format for video subtitles
voxtus -f srt video.mp4
# VTT format for web video
voxtus -f vtt video.mp4
Advanced Usage
# Custom name and output directory
voxtus -f json -n "meeting_notes" -o ~/transcripts video.mp4
# Verbose output with audio retention
voxtus -v -k -f txt,json https://youtu.be/example
# Pipeline integration
voxtus -f json --stdout video.mp4 | jq '.metadata.duration'
# Overwrite existing files
voxtus -f json --overwrite video.mp4
# Model selection for different use cases
voxtus --model tiny -f txt video.mp4 # Fast transcription
voxtus --model large-v3 video.mp4 # Best quality
voxtus --model small.en podcast.mp3 # English podcast
Real-world Examples
# Generate data for analysis
voxtus -f json -o ~/podcast_analysis podcast.mp3
# LLM processing pipeline
voxtus -f txt --stdout lecture.mp4 | llm "summarize this lecture"
# Both formats for different uses
voxtus -f txt,json -n "interview_2024" interview.mp4
🔧 Options
| Option | Description |
|---|---|
-f, --format FORMAT |
Output format(s): txt, json, srt, vtt (comma-separated) |
-n, --name NAME |
Base name for output files (no extension) |
-o, --output DIR |
Output directory (default: current directory) |
-v, --verbose |
Increase verbosity (-v, -vv for debug) |
-k, --keep |
Keep the downloaded/converted audio file |
--model MODEL |
Whisper model to use (default: small) |
--list-models |
List available models and their characteristics |
--overwrite |
Overwrite existing files without confirmation |
--stdout |
Output to stdout (single format only) |
--version |
Show version and exit |
📊 JSON Format Structure
The JSON format includes rich metadata for advanced use cases:
{
"transcript": [
{
"id": 1,
"start": 0.0,
"end": 5.2,
"text": "Welcome to our podcast."
}
],
"metadata": {
"title": "Podcast Episode 42",
"source": "https://youtube.com/watch?v=...",
"duration": 1523.5,
"model": "base",
"language": "en"
}
}
📦 Packaging
Voxtus is structured as a proper Python CLI package using pyproject.toml with a voxtus entry point.
After installation (via pip or pipx), the voxtus command is available directly from your shell.
🔐 License
Licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).
See LICENSE or visit AGPL-3.0 for more.
🔗 Project Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voxtus-0.3.2.tar.gz.
File metadata
- Download URL: voxtus-0.3.2.tar.gz
- Upload date:
- Size: 32.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98db7b0ae6dfc0fcb58e35187a82b276d645e203568a12f63b3ba3f0fe6d1225
|
|
| MD5 |
c912535abeafc4783ae38799dec414e3
|
|
| BLAKE2b-256 |
b829272034d8e3fe6d03255d9aa32b32cb23de3edfe273d7d7d9418ed5125c32
|
Provenance
The following attestation bundles were made for voxtus-0.3.2.tar.gz:
Publisher:
publish.yml on johanthoren/voxtus
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voxtus-0.3.2.tar.gz -
Subject digest:
98db7b0ae6dfc0fcb58e35187a82b276d645e203568a12f63b3ba3f0fe6d1225 - Sigstore transparency entry: 769721085
- Sigstore integration time:
-
Permalink:
johanthoren/voxtus@591a69549c0115d27e2ca349a885b1d2ddc2c801 -
Branch / Tag:
refs/tags/0.3.2 - Owner: https://github.com/johanthoren
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@591a69549c0115d27e2ca349a885b1d2ddc2c801 -
Trigger Event:
push
-
Statement type:
File details
Details for the file voxtus-0.3.2-py3-none-any.whl.
File metadata
- Download URL: voxtus-0.3.2-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48fba771f0c649fabec147559c68a3259f14834fdc8b24b3aff9f6858af8e908
|
|
| MD5 |
c4751a8a38348e0fae7be19afe52cb47
|
|
| BLAKE2b-256 |
e09682d9cb8768218b108441d691607fd112aeedb84259965b5492ebf3b63517
|
Provenance
The following attestation bundles were made for voxtus-0.3.2-py3-none-any.whl:
Publisher:
publish.yml on johanthoren/voxtus
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voxtus-0.3.2-py3-none-any.whl -
Subject digest:
48fba771f0c649fabec147559c68a3259f14834fdc8b24b3aff9f6858af8e908 - Sigstore transparency entry: 769721109
- Sigstore integration time:
-
Permalink:
johanthoren/voxtus@591a69549c0115d27e2ca349a885b1d2ddc2c801 -
Branch / Tag:
refs/tags/0.3.2 - Owner: https://github.com/johanthoren
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@591a69549c0115d27e2ca349a885b1d2ddc2c801 -
Trigger Event:
push
-
Statement type: