Skip to main content

[DEPRECATED - Use 'cargo install voxtus' instead] Transcribe Internet videos and media files to text

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

🗣️ Voxtus (Python - Deprecated)

⚠️ DEPRECATED: This Python version is no longer maintained. Please use the new Rust implementation: github.com/johanthoren/voxtus

Install the new version with: cargo install voxtus


Voxtus is a command-line tool for transcribing Internet videos and media files to text using faster-whisper.

It supports multiple output formats and can download, transcribe, and optionally retain the original audio. It's built in Python and installable as a proper CLI via PyPI or from source.

✨ Features

  • 🎥 Download & transcribe videos from YouTube, Vimeo, and 1000+ sites
  • 📁 Local file support for audio/video files
  • 📝 Multiple output formats: TXT, JSON, SRT, VTT
  • 🎛️ Model selection - Choose from tiny to large models for speed/accuracy trade-offs
  • 🔄 Batch processing multiple formats in one run
  • 📊 Rich metadata in JSON format (title, source, duration, language)
  • 🚀 Stdout mode for pipeline integration
  • 🎯 LLM-friendly default text format
  • Fast transcription via faster-whisper

⚙️ Installation

1. Install system dependency: ffmpeg

Voxtus uses ffmpeg under the hood to extract audio from video files.

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

2. Recommended for end users (via pipx)

pipx install voxtus

After that, simply run:

voxtus --help

🧪 Development Setup

Quick Start for Contributors

git clone https://github.com/johanthoren/voxtus.git
cd voxtus

# Install uv (fast Python package manager)
brew install uv         # macOS
# or: pip install uv    # any platform

# Setup development environment
make dev-install

# Run tests
make test

Development Workflow

The project uses a simple Makefile for development tasks. All targets automatically verify dependencies and provide helpful installation instructions if tools are missing.

make help              # Show all available commands with dynamic version examples
make install           # Install package and dependencies
make dev-install       # Install with development dependencies
make run               # Run development version (e.g., make run -- -f json file.mp4)
make test              # Run tests (fast)
make test-coverage     # Run tests with coverage report
make test-ci           # Run GitHub Actions workflow locally (requires act)

# Dependency verification
make verify-uv         # Check if uv is installed
make verify-act        # Check if act is installed

# Release (bumps version, commits, tags, and pushes)
make release           # Patch release (e.g., 0.1.9 -> 0.1.10)
make release patch     # Patch release (same as above)
make release minor     # Minor release (e.g., 0.1.9 -> 0.2.0)
make release major     # Major release (e.g., 0.1.9 -> 1.0.0)

Dependencies

The Makefile automatically checks for required tools:

  • uv - Fast Python package manager (required for most targets)
    • Install: curl -LsSf https://astral.sh/uv/install.sh | sh or brew install uv
  • act - Run GitHub Actions locally (optional, only for test-ci)

Enhanced Release Process

The release process includes comprehensive safety checks:

  1. Git Status Check - Offers to stage and commit pending changes
  2. Test Suite - Runs tests with coverage reporting
  3. Coverage Validation - Prompts if coverage is below 80%
  4. Version Bump - Updates pyproject.toml and commits the change
  5. Git Operations - Creates tag and pushes to trigger CI/CD

Local CI Testing

Use make test-ci to run the exact same GitHub Actions workflow locally:

make test-ci    # Runs .github/workflows/test.yml with act

This ensures your changes work in the CI environment before pushing.


🧪 For contributors / running from source

git clone https://github.com/johanthoren/voxtus.git
cd voxtus
brew install uv         # or: pip install uv
uv venv
source .venv/bin/activate
uv pip install .

Then run:

voxtus --help

📋 Output Formats

Format Description Use Case
TXT Plain text with timestamps Default, LLM processing, reading
JSON Structured data with metadata APIs, data analysis, archival
SRT SubRip subtitle format Video subtitles, media players
VTT WebVTT subtitle format Web browsers, HTML5 video

Additional formats (CSV) are planned for future releases.

🔧 Extensible Format System

Voxtus uses a modular format system that makes adding new output formats straightforward. Each format is implemented as a separate module with its own writer class, making the codebase maintainable and extensible.


🎛️ Model Selection

Voxtus supports multiple Whisper models with different trade-offs between speed, accuracy, and resource usage:

Available Models

Model Parameters VRAM Languages Best For
tiny 39M ~1GB Multilingual Fastest inference, low resources
tiny.en 39M ~1GB English only Fastest English-only transcription
base 74M ~1GB Multilingual Good balance for minimal resources
base.en 74M ~1GB English only Balanced English-only
small 244M ~2GB Multilingual Default balance
small.en 244M ~2GB English only Higher accuracy English
medium 769M ~5GB Multilingual Good accuracy, slower
medium.en 769M ~5GB English only Good accuracy English
distil-large-v3 756M ~6GB Multilingual Faster with good accuracy
large 1550M ~10GB Multilingual Highest accuracy
large-v2 1550M ~10GB Multilingual Improved large model
large-v3 1550M ~10GB Multilingual Latest large model
turbo 809M ~6GB Multilingual Optimized for speed

VRAM requirements are from OpenAI's official specifications. Actual performance varies by hardware and audio content.

Model Selection Guide

# List all available models with characteristics
voxtus --list-models

# Speed-optimized (fastest)
voxtus --model tiny video.mp4

# Balanced (default)
voxtus --model small video.mp4

# Better accuracy with speed
voxtus --model distil-large-v3 video.mp4

# Quality-optimized (most accurate)
voxtus --model large-v3 video.mp4

# English-only (faster for English content)
voxtus --model small.en video.mp4

💡 Tip: English-only models (.en) are faster and more accurate for English content, while multilingual models work with 99+ languages.


🧪 Examples

Basic Usage

# Transcribe to default TXT format
voxtus https://www.youtube.com/watch?v=abc123

# Transcribe local file
voxtus recording.mp3

Format Selection

# Single format
voxtus -f json video.mp4

# Multiple formats at once
voxtus -f txt,json,srt,vtt video.mp4

# SRT format for video subtitles
voxtus -f srt video.mp4

# VTT format for web video
voxtus -f vtt video.mp4

Advanced Usage

# Custom name and output directory
voxtus -f json -n "meeting_notes" -o ~/transcripts video.mp4

# Verbose output with audio retention
voxtus -v -k -f txt,json https://youtu.be/example

# Pipeline integration
voxtus -f json --stdout video.mp4 | jq '.metadata.duration'

# Overwrite existing files
voxtus -f json --overwrite video.mp4

# Model selection for different use cases
voxtus --model tiny -f txt video.mp4     # Fast transcription
voxtus --model large-v3 video.mp4        # Best quality
voxtus --model small.en podcast.mp3      # English podcast

Real-world Examples

# Generate data for analysis
voxtus -f json -o ~/podcast_analysis podcast.mp3

# LLM processing pipeline
voxtus -f txt --stdout lecture.mp4 | llm "summarize this lecture"

# Both formats for different uses
voxtus -f txt,json -n "interview_2024" interview.mp4

🔧 Options

Option Description
-f, --format FORMAT Output format(s): txt, json, srt, vtt (comma-separated)
-n, --name NAME Base name for output files (no extension)
-o, --output DIR Output directory (default: current directory)
-v, --verbose Increase verbosity (-v, -vv for debug)
-k, --keep Keep the downloaded/converted audio file
--model MODEL Whisper model to use (default: small)
--list-models List available models and their characteristics
--overwrite Overwrite existing files without confirmation
--stdout Output to stdout (single format only)
--version Show version and exit

📊 JSON Format Structure

The JSON format includes rich metadata for advanced use cases:

{
  "transcript": [
    {
      "id": 1,
      "start": 0.0,
      "end": 5.2,
      "text": "Welcome to our podcast."
    }
  ],
  "metadata": {
    "title": "Podcast Episode 42",
    "source": "https://youtube.com/watch?v=...",
    "duration": 1523.5,
    "model": "base",
    "language": "en"
  }
}

📦 Packaging

Voxtus is structured as a proper Python CLI package using pyproject.toml with a voxtus entry point.

After installation (via pip or pipx), the voxtus command is available directly from your shell.


🔐 License

Licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).

See LICENSE or visit AGPL-3.0 for more.


🔗 Project Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxtus-0.3.2.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voxtus-0.3.2-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file voxtus-0.3.2.tar.gz.

File metadata

  • Download URL: voxtus-0.3.2.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voxtus-0.3.2.tar.gz
Algorithm Hash digest
SHA256 98db7b0ae6dfc0fcb58e35187a82b276d645e203568a12f63b3ba3f0fe6d1225
MD5 c912535abeafc4783ae38799dec414e3
BLAKE2b-256 b829272034d8e3fe6d03255d9aa32b32cb23de3edfe273d7d7d9418ed5125c32

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxtus-0.3.2.tar.gz:

Publisher: publish.yml on johanthoren/voxtus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file voxtus-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: voxtus-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voxtus-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 48fba771f0c649fabec147559c68a3259f14834fdc8b24b3aff9f6858af8e908
MD5 c4751a8a38348e0fae7be19afe52cb47
BLAKE2b-256 e09682d9cb8768218b108441d691607fd112aeedb84259965b5492ebf3b63517

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxtus-0.3.2-py3-none-any.whl:

Publisher: publish.yml on johanthoren/voxtus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page