Python Framework for OCR using Qwen3-VL Models
Project description
ocrxdoc - Python Framework for OCR
A clean, easy-to-use Python framework for OCR (Optical Character Recognition) using Qwen3-VL AI models. Supports images (JPG, PNG, JPEG), PDF, DOCX, and TXT files.
Features
- 🖼️ Image OCR: Support for JPG, PNG, JPEG
- 📄 Document OCR: Support for PDF, DOCX, TXT
- 🤖 Two AI Models:
- 4B model (default) - More accurate
- 2B model - Faster
- 🖥️ GPU/CPU Support: Automatic GPU detection and usage
- 🎯 ROI Selection: Select custom regions for OCR
- 📦 Batch Processing: Process multiple files at once
- ⚡ Easy to Use: Simple, clean API
Installation
Basic Installation
pip install ocrxdoc
With PDF Support
pip install ocrxdoc[pdf]
With DOCX Support
pip install ocrxdoc[docx]
With All Features
pip install ocrxdoc[all]
Quick Start
Basic Usage
from ocrxdoc import OCREngine
# Initialize OCR engine
engine = OCREngine(model_size="4B", device="auto")
# Load model
engine.load_model()
# Process an image
result = engine.ocr("path/to/image.jpg", prompt="Extract all text from this image")
print(result)
Process Different File Types
from ocrxdoc import OCREngine
engine = OCREngine(model_size="4B")
engine.load_model()
# Process image
result = engine.ocr("image.jpg")
# Process PDF
result = engine.ocr("document.pdf")
# Process DOCX
result = engine.ocr("document.docx")
# Process TXT
result = engine.ocr("text.txt")
Batch Processing
from ocrxdoc import OCREngine
engine = OCREngine(model_size="4B")
engine.load_model()
files = ["image1.jpg", "image2.png", "document.pdf"]
def progress_callback(current, total, filename):
print(f"Processing {current}/{total}: {filename}")
results = engine.ocr_batch(files, progress_callback=progress_callback)
for file_path, result in results:
print(f"{file_path}: {result[:100]}...")
Custom Model Path
from ocrxdoc import OCREngine
# Use custom model path
engine = OCREngine(
model_path="./custom/models/Qwen3-VL-4B-Instruct",
device="cuda:0"
)
engine.load_model()
ROI (Region of Interest) Selection
from ocrxdoc import OCREngine
engine = OCREngine(model_size="4B")
engine.load_model()
# OCR only a specific region: (x, y, width, height)
result = engine.ocr(
"image.jpg",
roi=(100, 100, 500, 300) # Crop region before OCR
)
Custom Generation Parameters
from ocrxdoc import OCREngine
engine = OCREngine(
model_size="4B",
max_tokens=5000,
temperature=0.1,
top_p=0.9
)
engine.load_model()
# Or update after initialization
engine.set_generation_params(
max_tokens=5000,
temperature=0.1
)
Model Setup
Models need to be downloaded manually due to their large size:
-
4B Model (Default):
- Download from: Hugging Face - Qwen3-VL-4B-Instruct
- Place in:
./models/Qwen3-VL-4B-Instruct/
-
2B Model:
- Download from: Hugging Face - Qwen3-VL-2B-Instruct
- Place in:
./models/Qwen3-VL-2B-Instruct/
Requirements
- Python 3.8+
- PyTorch 2.0+
- Transformers 4.57+
- Pillow 10.0+
- For PDF: pdf2image and Poppler
- For DOCX: python-docx
System Requirements
- RAM: Minimum 16GB (recommended 32GB+)
- GPU: Recommended (NVIDIA with CUDA support) - VRAM minimum 8GB
- Paging File: Minimum 8GB for 4B model, 4GB for 2B model
API Reference
OCREngine
Main OCR engine class.
__init__(model_path=None, model_size="4B", device="auto", dtype=None, poppler_path=None, max_tokens=3000, temperature=0.2, top_p=0.8, top_k=50, repetition_penalty=1.1)
Initialize OCR engine.
load_model()
Load the OCR model and processor.
ocr(file_path, prompt="...", roi=None)
Perform OCR on a file.
file_path: Path to fileprompt: Prompt for OCR modelroi: Optional region of interest as (x, y, width, height)
Returns: Extracted text string
ocr_batch(file_paths, prompt="...", progress_callback=None)
Perform OCR on multiple files.
file_paths: List of file pathsprompt: Prompt for OCR modelprogress_callback: Optional callback(current, total, filename)
Returns: List of tuples (file_path, ocr_result)
set_generation_params(max_tokens=None, temperature=None, top_p=None, top_k=None, repetition_penalty=None)
Update generation parameters.
cleanup()
Clean up temporary files.
Examples
See examples/ directory for more examples.
License
MIT License
Acknowledgments
- Qwen3-VL - AI Model
- Hugging Face Transformers
- PyTorch
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ocrxdoc-1.0.0.tar.gz.
File metadata
- Download URL: ocrxdoc-1.0.0.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d81c7c6cc739cef119960367a0704a87af480db129948b420c3e951a0dc02c6f
|
|
| MD5 |
9aed3a73350df72091a7195852989a6b
|
|
| BLAKE2b-256 |
af91cdf65f30e35b8732ff8f6c6246275f38788d041a60e898a127468e293f12
|
File details
Details for the file ocrxdoc-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ocrxdoc-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f39007f871c16ff584080738b70867ecbc5c58f71e7f77b1d943102bd9af3bf
|
|
| MD5 |
9f8a935a5c1db4d0493a3f18c19f525c
|
|
| BLAKE2b-256 |
b3d9d36b038821b5ffe7bf32fd1ead981f6bb6ed1b7be0def38fe1f151cc900e
|