Python Framework for OCR using Qwen3-VL Models

These details have not been verified by PyPI

Project links

Project description

ocrxdoc - Python Framework for OCR

A clean, easy-to-use Python framework for OCR (Optical Character Recognition) using Qwen3-VL AI models. Supports images (JPG, PNG, JPEG), PDF, DOCX, and TXT files.

Features

🖼️ Image OCR: Support for JPG, PNG, JPEG
📄 Document OCR: Support for PDF, DOCX, TXT
🤖 Two AI Models:
- 4B model (default) - More accurate
- 2B model - Faster
🖥️ GPU/CPU Support: Automatic GPU detection and usage
🎯 ROI Selection: Select custom regions for OCR
📦 Batch Processing: Process multiple files at once
⚡ Easy to Use: Simple, clean API

Installation

Basic Installation

pip install ocrxdoc

With PDF Support

pip install ocrxdoc[pdf]

With DOCX Support

pip install ocrxdoc[docx]

With All Features

pip install ocrxdoc[all]

Quick Start

Basic Usage

from ocrxdoc import OCREngine

# Initialize OCR engine
engine = OCREngine(model_size="4B", device="auto")

# Load model
engine.load_model()

# Process an image
result = engine.ocr("path/to/image.jpg", prompt="Extract all text from this image")
print(result)

Process Different File Types

from ocrxdoc import OCREngine

engine = OCREngine(model_size="4B")
engine.load_model()

# Process image
result = engine.ocr("image.jpg")

# Process PDF
result = engine.ocr("document.pdf")

# Process DOCX
result = engine.ocr("document.docx")

# Process TXT
result = engine.ocr("text.txt")

Batch Processing

from ocrxdoc import OCREngine

engine = OCREngine(model_size="4B")
engine.load_model()

files = ["image1.jpg", "image2.png", "document.pdf"]

def progress_callback(current, total, filename):
    print(f"Processing {current}/{total}: {filename}")

results = engine.ocr_batch(files, progress_callback=progress_callback)

for file_path, result in results:
    print(f"{file_path}: {result[:100]}...")

Custom Model Path

from ocrxdoc import OCREngine

# Use custom model path
engine = OCREngine(
    model_path="./custom/models/Qwen3-VL-4B-Instruct",
    device="cuda:0"
)
engine.load_model()

ROI (Region of Interest) Selection

from ocrxdoc import OCREngine

engine = OCREngine(model_size="4B")
engine.load_model()

# OCR only a specific region: (x, y, width, height)
result = engine.ocr(
    "image.jpg",
    roi=(100, 100, 500, 300)  # Crop region before OCR
)

Custom Generation Parameters

from ocrxdoc import OCREngine

engine = OCREngine(
    model_size="4B",
    max_tokens=5000,
    temperature=0.1,
    top_p=0.9
)
engine.load_model()

# Or update after initialization
engine.set_generation_params(
    max_tokens=5000,
    temperature=0.1
)

Model Setup

Models need to be downloaded manually due to their large size:

4B Model (Default):
- Download from: Hugging Face - Qwen3-VL-4B-Instruct
- Place in: ./models/Qwen3-VL-4B-Instruct/
2B Model:
- Download from: Hugging Face - Qwen3-VL-2B-Instruct
- Place in: ./models/Qwen3-VL-2B-Instruct/

Requirements

Python 3.8+
PyTorch 2.0+
Transformers 4.57+
Pillow 10.0+
For PDF: pdf2image and Poppler
For DOCX: python-docx

System Requirements

RAM: Minimum 16GB (recommended 32GB+)
GPU: Recommended (NVIDIA with CUDA support) - VRAM minimum 8GB
Paging File: Minimum 8GB for 4B model, 4GB for 2B model

API Reference

OCREngine

Main OCR engine class.

`init(model_path=None, model_size="4B", device="auto", dtype=None, poppler_path=None, max_tokens=3000, temperature=0.2, top_p=0.8, top_k=50, repetition_penalty=1.1)`

Initialize OCR engine.

`load_model()`

Load the OCR model and processor.

`ocr(file_path, prompt="...", roi=None)`

Perform OCR on a file.

file_path: Path to file
prompt: Prompt for OCR model
roi: Optional region of interest as (x, y, width, height)

Returns: Extracted text string

`ocr_batch(file_paths, prompt="...", progress_callback=None)`

Perform OCR on multiple files.

file_paths: List of file paths
prompt: Prompt for OCR model
progress_callback: Optional callback(current, total, filename)

Returns: List of tuples (file_path, ocr_result)

`set_generation_params(max_tokens=None, temperature=None, top_p=None, top_k=None, repetition_penalty=None)`

Update generation parameters.

`cleanup()`

Clean up temporary files.

Examples

See examples/ directory for more examples.

License

MIT License

Acknowledgments

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrxdoc-1.0.0.tar.gz (13.5 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ocrxdoc-1.0.0-py3-none-any.whl (12.0 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file ocrxdoc-1.0.0.tar.gz.

File metadata

Download URL: ocrxdoc-1.0.0.tar.gz
Upload date: Nov 17, 2025
Size: 13.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ocrxdoc-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`d81c7c6cc739cef119960367a0704a87af480db129948b420c3e951a0dc02c6f`
MD5	`9aed3a73350df72091a7195852989a6b`
BLAKE2b-256	`af91cdf65f30e35b8732ff8f6c6246275f38788d041a60e898a127468e293f12`

See more details on using hashes here.

File details

Details for the file ocrxdoc-1.0.0-py3-none-any.whl.

File metadata

Download URL: ocrxdoc-1.0.0-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ocrxdoc-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f39007f871c16ff584080738b70867ecbc5c58f71e7f77b1d943102bd9af3bf`
MD5	`9f8a935a5c1db4d0493a3f18c19f525c`
BLAKE2b-256	`b3d9d36b038821b5ffe7bf32fd1ead981f6bb6ed1b7be0def38fe1f151cc900e`

See more details on using hashes here.

ocrxdoc 1.0.0

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ocrxdoc - Python Framework for OCR

Features

Installation

Basic Installation

With PDF Support

With DOCX Support

With All Features

Quick Start

Basic Usage

Process Different File Types

Batch Processing

Custom Model Path

ROI (Region of Interest) Selection

Custom Generation Parameters

Model Setup

Requirements

System Requirements

API Reference

OCREngine

__init__(model_path=None, model_size="4B", device="auto", dtype=None, poppler_path=None, max_tokens=3000, temperature=0.2, top_p=0.8, top_k=50, repetition_penalty=1.1)

load_model()

ocr(file_path, prompt="...", roi=None)

ocr_batch(file_paths, prompt="...", progress_callback=None)

set_generation_params(max_tokens=None, temperature=None, top_p=None, top_k=None, repetition_penalty=None)

cleanup()

Examples

License

Acknowledgments

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`init(model_path=None, model_size="4B", device="auto", dtype=None, poppler_path=None, max_tokens=3000, temperature=0.2, top_p=0.8, top_k=50, repetition_penalty=1.1)`

`load_model()`

`ocr(file_path, prompt="...", roi=None)`

`ocr_batch(file_paths, prompt="...", progress_callback=None)`

`set_generation_params(max_tokens=None, temperature=None, top_p=None, top_k=None, repetition_penalty=None)`

`cleanup()`