Comprehensive document conversion library with batch processing, caching, and template rendering

These details have not been verified by PyPI

Project links

Project description

Document Converter

Document Converter Logo

A comprehensive Python library for document conversion with batch processing, intelligent caching, and template rendering.

Features • Installation • Quick Start • Documentation • Contributing

✨ Features

🔄 Multi-Format Conversion

Convert between popular document formats:

PDF ↔ TXT, DOCX (with OCR support for scanned documents)
DOCX ↔ PDF, HTML, Markdown, TXT
HTML ↔ PDF, DOCX
Markdown ↔ HTML, PDF
ODT ↔ Multiple formats
TXT ↔ HTML, PDF

⚡ High Performance

Two-tier caching: In-memory LRU + persistent disk cache
Up to 138x speedup on repeated conversions
Parallel batch processing: 50-200 files/second
Streaming template rendering for memory efficiency

🛠️ Developer Friendly

Clean, extensible API
Comprehensive error handling with actionable suggestions
Transaction safety with automatic rollback
Full CLI with progress bars
79% test coverage with 274+ tests

📦 Standalone Executable

Interactive mode: Double-click and use menu-driven interface
CLI mode: Full command-line support
Drag & Drop: Drop multiple files onto the .exe to convert them all at once
No Python installation required for end users

📋 Requirements

Python 3.9+
See requirements.txt for dependencies

🚀 Installation

From Source

# Clone the repository
git clone https://github.com/MikeAMSDev/document-converter
cd document-converter

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

Verify Installation

python -c "from converter.engine import ConversionEngine; print('✓ Installation successful!')"

🎯 Quick Start

Basic Conversion

from converter.engine import ConversionEngine
from converter.formats.pdf_converter import PDFConverter

# Setup
engine = ConversionEngine()
engine.register_converter('pdf', PDFConverter)

# Convert
engine.convert('document.pdf', 'document.txt')

Batch Processing

from converter.batch_processor import BatchProcessor

processor = BatchProcessor(max_workers=8)
processor.scan_directory('./documents', './output', from_format='docx', to_format='pdf')
report = processor.process_queue()

print(f"Converted {report.success} files")

With Caching (138x Faster!)

from converter.engine import ConversionEngine
from core.cache_manager import CacheManager

cache = CacheManager(cache_dir=".cache")
engine = ConversionEngine(cache_manager=cache)

# First conversion: normal speed
engine.convert('large.pdf', 'large.txt')

# Second conversion: instant (from cache)
engine.convert('large.pdf', 'large_copy.txt')

Template Rendering

from converter.template_engine import TemplateEngine

engine = TemplateEngine()
template = "Hello {{ name }}! {% for item in items %}{{ item }} {% endfor %}"
result = engine.render(template, {"name": "World", "items": ["A", "B", "C"]})

💻 CLI Usage

Single File Conversion

# Standard conversion
python -m cli.main convert input.pdf output.txt

# With options
python -m cli.main convert input.pdf --output output.txt --ocr

Drag & Drop Multiple Files (Windows)

# Drop files onto document-converter.exe, or run:
document-converter.exe file1.docx file2.pdf file3.txt --format pdf

# Result: Converts all to PDF in the same directory

Batch Processing

python -m cli.main batch ./documents ./output --from-format docx --to-format pdf --workers 8

Cache Management

# View cache stats
python -m cli.main cache-stats

# Clear cache
python -m cli.main cache-clear

Standalone Executable

Download document-converter.exe from the dist/ folder:

# Interactive mode (double-click or run without arguments)
document-converter.exe

# CLI mode
document-converter.exe convert input.pdf output.txt

📚 Documentation

Document	Description
User Guide	Step-by-step tutorials and common use cases
API Reference	Complete API documentation
Developer Guide	Contributing and extending the library
Examples	Ready-to-run example scripts
Changelog	Version history and changes

📁 Project Structure

document-converter/
├── converter/          # Core conversion logic
│   ├── engine.py       # Main conversion engine
│   ├── batch_processor.py
│   ├── template_engine.py
│   ├── formats/        # Format-specific converters
│   └── processors/     # OCR, images, styles
├── core/               # Core utilities
│   ├── cache_manager.py
│   ├── error_handler.py
│   ├── transaction.py
│   └── worker_pool.py
├── cli/                # Command-line interface
├── utils/              # Helper utilities
├── docs/               # Documentation
├── examples/           # Example scripts
├── tests/              # Test suite
└── dist/               # Standalone executable

🧪 Testing

# Run all tests
pytest

# With coverage
pytest --cov=converter --cov=core --cov-report=html

# Run specific test types
pytest -m unit
pytest -m integration

Current Coverage: 79% (274+ tests)

🤝 Contributing

Contributions are welcome! Please read our Developer Guide for:

Development setup
Code style guidelines
Testing requirements
How to add new format converters

Quick Start for Contributors

# Fork and clone
git clone https://github.com/MikeAMSDev/document-converter
cd document-converter

# Install dev dependencies
pip install -r requirements-dev.txt

# Create feature branch
git checkout -b feat/my-feature

# Make changes and test
pytest

# Submit pull request

📊 Performance Benchmarks

Operation	Performance
Cache Speedup	Up to 138x faster
Batch Throughput	50-200 files/sec
Memory Cache Lookup	<1ms
Disk Cache Lookup	<100ms
Template Rendering (100K items)	<5 seconds

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with Python 3.13
PDF processing: PyPDF2, ReportLab
DOCX handling: python-docx
OCR: Tesseract via pytesseract
CLI: Click

Made with ❤️ by MikeAMSDev

⭐ Star this repo if you find it useful!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.0

Dec 26, 2025

1.1.2

Dec 16, 2025

1.1.0

Dec 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_converter-1.2.0.tar.gz (226.8 kB view details)

Uploaded Dec 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

document_converter-1.2.0-py3-none-any.whl (79.0 kB view details)

Uploaded Dec 26, 2025 Python 3

File details

Details for the file document_converter-1.2.0.tar.gz.

File metadata

Download URL: document_converter-1.2.0.tar.gz
Upload date: Dec 26, 2025
Size: 226.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for document_converter-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`f3a27a5a847eecdee43c5a21737b4c6e16ea11064594441dc51257609f95ea11`
MD5	`f3bad24270971365162a25c1ae20a83f`
BLAKE2b-256	`ab06fb4cf3b0626e15fd16ecc6f9a69984041d6a187b4dac9189e8f2a485d89f`

See more details on using hashes here.

File details

Details for the file document_converter-1.2.0-py3-none-any.whl.

File metadata

Download URL: document_converter-1.2.0-py3-none-any.whl
Upload date: Dec 26, 2025
Size: 79.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for document_converter-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cef345bbc9a14c876162ba17bc99b9e346b67fa02dd0c412927a5b29839cdf4e`
MD5	`90bce745e68689e5dc4bd30fa586ca68`
BLAKE2b-256	`4fcd3090a3998586cf96d48e6e9eb13d5ee63c128958fe44d7b8ad0e60cc496a`

See more details on using hashes here.

document-converter 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Document Converter

✨ Features

🔄 Multi-Format Conversion

⚡ High Performance

🛠️ Developer Friendly

📦 Standalone Executable

📋 Requirements

🚀 Installation

From Source

Verify Installation

🎯 Quick Start

Basic Conversion

Batch Processing

With Caching (138x Faster!)

Template Rendering

💻 CLI Usage

Single File Conversion

Drag & Drop Multiple Files (Windows)

Batch Processing

Cache Management

Standalone Executable

📚 Documentation

📁 Project Structure

🧪 Testing

🤝 Contributing

Quick Start for Contributors

📊 Performance Benchmarks

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes