[DEPRECATED - Use 'cargo install voxtus' instead] Transcribe Internet videos and media files to text

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

🗣️ Voxtus (Python - Deprecated)

⚠️ DEPRECATED: This Python version is no longer maintained. Please use the new Rust implementation: github.com/johanthoren/voxtus

Install the new version with: cargo install voxtus

Voxtus is a command-line tool for transcribing Internet videos and media files to text using faster-whisper.

It supports multiple output formats and can download, transcribe, and optionally retain the original audio. It's built in Python and installable as a proper CLI via PyPI or from source.

✨ Features

🎥 Download & transcribe videos from YouTube, Vimeo, and 1000+ sites
📁 Local file support for audio/video files
📝 Multiple output formats: TXT, JSON, SRT, VTT
🎛️ Model selection - Choose from tiny to large models for speed/accuracy trade-offs
🔄 Batch processing multiple formats in one run
📊 Rich metadata in JSON format (title, source, duration, language)
🚀 Stdout mode for pipeline integration
🎯 LLM-friendly default text format
⚡ Fast transcription via faster-whisper

⚙️ Installation

1. Install system dependency: ffmpeg

Voxtus uses ffmpeg under the hood to extract audio from video files.

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

2. Recommended for end users (via pipx)

pipx install voxtus

After that, simply run:

voxtus --help

🧪 Development Setup

Quick Start for Contributors

git clone https://github.com/johanthoren/voxtus.git
cd voxtus

# Install uv (fast Python package manager)
brew install uv         # macOS
# or: pip install uv    # any platform

# Setup development environment
make dev-install

# Run tests
make test

Development Workflow

The project uses a simple Makefile for development tasks. All targets automatically verify dependencies and provide helpful installation instructions if tools are missing.

make help              # Show all available commands with dynamic version examples
make install           # Install package and dependencies
make dev-install       # Install with development dependencies
make run               # Run development version (e.g., make run -- -f json file.mp4)
make test              # Run tests (fast)
make test-coverage     # Run tests with coverage report
make test-ci           # Run GitHub Actions workflow locally (requires act)

# Dependency verification
make verify-uv         # Check if uv is installed
make verify-act        # Check if act is installed

# Release (bumps version, commits, tags, and pushes)
make release           # Patch release (e.g., 0.1.9 -> 0.1.10)
make release patch     # Patch release (same as above)
make release minor     # Minor release (e.g., 0.1.9 -> 0.2.0)
make release major     # Major release (e.g., 0.1.9 -> 1.0.0)

Dependencies

The Makefile automatically checks for required tools:

uv - Fast Python package manager (required for most targets)
- Install: curl -LsSf https://astral.sh/uv/install.sh | sh or brew install uv
act - Run GitHub Actions locally (optional, only for test-ci)
- Install: brew install act or see installation guide

Enhanced Release Process

The release process includes comprehensive safety checks:

Git Status Check - Offers to stage and commit pending changes
Test Suite - Runs tests with coverage reporting
Coverage Validation - Prompts if coverage is below 80%
Version Bump - Updates pyproject.toml and commits the change
Git Operations - Creates tag and pushes to trigger CI/CD

Local CI Testing

Use make test-ci to run the exact same GitHub Actions workflow locally:

make test-ci    # Runs .github/workflows/test.yml with act

This ensures your changes work in the CI environment before pushing.

🧪 For contributors / running from source

git clone https://github.com/johanthoren/voxtus.git
cd voxtus
brew install uv         # or: pip install uv
uv venv
source .venv/bin/activate
uv pip install .

Then run:

voxtus --help

📋 Output Formats

Format	Description	Use Case
TXT	Plain text with timestamps	Default, LLM processing, reading
JSON	Structured data with metadata	APIs, data analysis, archival
SRT	SubRip subtitle format	Video subtitles, media players
VTT	WebVTT subtitle format	Web browsers, HTML5 video

Additional formats (CSV) are planned for future releases.

🔧 Extensible Format System

Voxtus uses a modular format system that makes adding new output formats straightforward. Each format is implemented as a separate module with its own writer class, making the codebase maintainable and extensible.

🎛️ Model Selection

Voxtus supports multiple Whisper models with different trade-offs between speed, accuracy, and resource usage:

Available Models

Model	Parameters	VRAM	Languages	Best For
tiny	39M	~1GB	Multilingual	Fastest inference, low resources
tiny.en	39M	~1GB	English only	Fastest English-only transcription
base	74M	~1GB	Multilingual	Good balance for minimal resources
base.en	74M	~1GB	English only	Balanced English-only
small	244M	~2GB	Multilingual	Default balance
small.en	244M	~2GB	English only	Higher accuracy English
medium	769M	~5GB	Multilingual	Good accuracy, slower
medium.en	769M	~5GB	English only	Good accuracy English
distil-large-v3	756M	~6GB	Multilingual	Faster with good accuracy
large	1550M	~10GB	Multilingual	Highest accuracy
large-v2	1550M	~10GB	Multilingual	Improved large model
large-v3	1550M	~10GB	Multilingual	Latest large model
turbo	809M	~6GB	Multilingual	Optimized for speed

VRAM requirements are from OpenAI's official specifications. Actual performance varies by hardware and audio content.

Model Selection Guide

# List all available models with characteristics
voxtus --list-models

# Speed-optimized (fastest)
voxtus --model tiny video.mp4

# Balanced (default)
voxtus --model small video.mp4

# Better accuracy with speed
voxtus --model distil-large-v3 video.mp4

# Quality-optimized (most accurate)
voxtus --model large-v3 video.mp4

# English-only (faster for English content)
voxtus --model small.en video.mp4

💡 Tip: English-only models (.en) are faster and more accurate for English content, while multilingual models work with 99+ languages.

🧪 Examples

Basic Usage

# Transcribe to default TXT format
voxtus https://www.youtube.com/watch?v=abc123

# Transcribe local file
voxtus recording.mp3

Format Selection

# Single format
voxtus -f json video.mp4

# Multiple formats at once
voxtus -f txt,json,srt,vtt video.mp4

# SRT format for video subtitles
voxtus -f srt video.mp4

# VTT format for web video
voxtus -f vtt video.mp4

Advanced Usage

# Custom name and output directory
voxtus -f json -n "meeting_notes" -o ~/transcripts video.mp4

# Verbose output with audio retention
voxtus -v -k -f txt,json https://youtu.be/example

# Pipeline integration
voxtus -f json --stdout video.mp4 | jq '.metadata.duration'

# Overwrite existing files
voxtus -f json --overwrite video.mp4

# Model selection for different use cases
voxtus --model tiny -f txt video.mp4     # Fast transcription
voxtus --model large-v3 video.mp4        # Best quality
voxtus --model small.en podcast.mp3      # English podcast

Real-world Examples

# Generate data for analysis
voxtus -f json -o ~/podcast_analysis podcast.mp3

# LLM processing pipeline
voxtus -f txt --stdout lecture.mp4 | llm "summarize this lecture"

# Both formats for different uses
voxtus -f txt,json -n "interview_2024" interview.mp4

🔧 Options

Option	Description
`-f`, `--format FORMAT`	Output format(s): txt, json, srt, vtt (comma-separated)
`-n, --name NAME`	Base name for output files (no extension)
`-o, --output DIR`	Output directory (default: current directory)
`-v, --verbose`	Increase verbosity (-v, -vv for debug)
`-k, --keep`	Keep the downloaded/converted audio file
`--model MODEL`	Whisper model to use (default: small)
`--list-models`	List available models and their characteristics
`--overwrite`	Overwrite existing files without confirmation
`--stdout`	Output to stdout (single format only)
`--version`	Show version and exit

📊 JSON Format Structure

The JSON format includes rich metadata for advanced use cases:

{
  "transcript": [
    {
      "id": 1,
      "start": 0.0,
      "end": 5.2,
      "text": "Welcome to our podcast."
    }
  ],
  "metadata": {
    "title": "Podcast Episode 42",
    "source": "https://youtube.com/watch?v=...",
    "duration": 1523.5,
    "model": "base",
    "language": "en"
  }
}

📦 Packaging

Voxtus is structured as a proper Python CLI package using pyproject.toml with a voxtus entry point.

After installation (via pip or pipx), the voxtus command is available directly from your shell.

🔐 License

Licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).

See LICENSE or visit AGPL-3.0 for more.

🔗 Project Links

Project details

Release history Release notifications | RSS feed

This version

0.3.2

Dec 18, 2025

0.3.1

Jun 2, 2025

0.2.0

May 31, 2025

0.1.10

May 31, 2025

0.1.9

May 27, 2025

0.1.8

May 27, 2025

0.1.7

May 26, 2025

0.1.6

May 26, 2025

0.1.4

May 26, 2025

0.1.3

May 26, 2025

0.1.0

May 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxtus-0.3.2.tar.gz (32.3 kB view details)

Uploaded Dec 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voxtus-0.3.2-py3-none-any.whl (32.2 kB view details)

Uploaded Dec 18, 2025 Python 3

File details

Details for the file voxtus-0.3.2.tar.gz.

File metadata

Download URL: voxtus-0.3.2.tar.gz
Upload date: Dec 18, 2025
Size: 32.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voxtus-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`98db7b0ae6dfc0fcb58e35187a82b276d645e203568a12f63b3ba3f0fe6d1225`
MD5	`c912535abeafc4783ae38799dec414e3`
BLAKE2b-256	`b829272034d8e3fe6d03255d9aa32b32cb23de3edfe273d7d7d9418ed5125c32`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxtus-0.3.2.tar.gz:

Publisher: publish.yml on johanthoren/voxtus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voxtus-0.3.2.tar.gz
- Subject digest: 98db7b0ae6dfc0fcb58e35187a82b276d645e203568a12f63b3ba3f0fe6d1225
- Sigstore transparency entry: 769721085
- Sigstore integration time: Dec 18, 2025
Source repository:
- Permalink: johanthoren/voxtus@591a69549c0115d27e2ca349a885b1d2ddc2c801
- Branch / Tag: refs/tags/0.3.2
- Owner: https://github.com/johanthoren
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@591a69549c0115d27e2ca349a885b1d2ddc2c801
- Trigger Event: push

File details

Details for the file voxtus-0.3.2-py3-none-any.whl.

File metadata

Download URL: voxtus-0.3.2-py3-none-any.whl
Upload date: Dec 18, 2025
Size: 32.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voxtus-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`48fba771f0c649fabec147559c68a3259f14834fdc8b24b3aff9f6858af8e908`
MD5	`c4751a8a38348e0fae7be19afe52cb47`
BLAKE2b-256	`e09682d9cb8768218b108441d691607fd112aeedb84259965b5492ebf3b63517`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxtus-0.3.2-py3-none-any.whl:

Publisher: publish.yml on johanthoren/voxtus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voxtus-0.3.2-py3-none-any.whl
- Subject digest: 48fba771f0c649fabec147559c68a3259f14834fdc8b24b3aff9f6858af8e908
- Sigstore transparency entry: 769721109
- Sigstore integration time: Dec 18, 2025
Source repository:
- Permalink: johanthoren/voxtus@591a69549c0115d27e2ca349a885b1d2ddc2c801
- Branch / Tag: refs/tags/0.3.2
- Owner: https://github.com/johanthoren
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@591a69549c0115d27e2ca349a885b1d2ddc2c801
- Trigger Event: push

voxtus 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🗣️ Voxtus (Python - Deprecated)

✨ Features

⚙️ Installation

1. Install system dependency: ffmpeg

macOS:

Ubuntu/Debian:

2. Recommended for end users (via pipx)

🧪 Development Setup

Quick Start for Contributors

Development Workflow

Dependencies

Enhanced Release Process

Local CI Testing

🧪 For contributors / running from source

📋 Output Formats

🔧 Extensible Format System

🎛️ Model Selection

Available Models

Model Selection Guide

🧪 Examples

Basic Usage

Format Selection

Advanced Usage

Real-world Examples

🔧 Options

📊 JSON Format Structure

📦 Packaging

🔐 License

🔗 Project Links

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance