Skip to main content

Python Framework for OCR using Qwen3-VL Models

Project description

ocrxdoc - Python Framework for OCR

A clean, easy-to-use Python framework for OCR (Optical Character Recognition) using Qwen3-VL AI models. Supports images (JPG, PNG, JPEG), PDF, DOCX, and TXT files.

Features

  • 🖼️ Image OCR: Support for JPG, PNG, JPEG
  • 📄 Document OCR: Support for PDF, DOCX, TXT
  • 🤖 Two AI Models:
    • 4B model (default) - More accurate
    • 2B model - Faster
  • 🖥️ GPU/CPU Support: Automatic GPU detection and usage
  • 🎯 ROI Selection: Select custom regions for OCR
  • 📦 Batch Processing: Process multiple files at once
  • Easy to Use: Simple, clean API

Installation

Basic Installation

pip install ocrxdoc

With PDF Support

pip install ocrxdoc[pdf]

With DOCX Support

pip install ocrxdoc[docx]

With All Features

pip install ocrxdoc[all]

Quick Start

Basic Usage

from ocrxdoc import OCREngine

# Initialize OCR engine
engine = OCREngine(model_size="4B", device="auto")

# Load model
engine.load_model()

# Process an image
result = engine.ocr("path/to/image.jpg", prompt="Extract all text from this image")
print(result)

Process Different File Types

from ocrxdoc import OCREngine

engine = OCREngine(model_size="4B")
engine.load_model()

# Process image
result = engine.ocr("image.jpg")

# Process PDF
result = engine.ocr("document.pdf")

# Process DOCX
result = engine.ocr("document.docx")

# Process TXT
result = engine.ocr("text.txt")

Batch Processing

from ocrxdoc import OCREngine

engine = OCREngine(model_size="4B")
engine.load_model()

files = ["image1.jpg", "image2.png", "document.pdf"]

def progress_callback(current, total, filename):
    print(f"Processing {current}/{total}: {filename}")

results = engine.ocr_batch(files, progress_callback=progress_callback)

for file_path, result in results:
    print(f"{file_path}: {result[:100]}...")

Custom Model Path

from ocrxdoc import OCREngine

# Use custom model path
engine = OCREngine(
    model_path="./custom/models/Qwen3-VL-4B-Instruct",
    device="cuda:0"
)
engine.load_model()

ROI (Region of Interest) Selection

from ocrxdoc import OCREngine

engine = OCREngine(model_size="4B")
engine.load_model()

# OCR only a specific region: (x, y, width, height)
result = engine.ocr(
    "image.jpg",
    roi=(100, 100, 500, 300)  # Crop region before OCR
)

Custom Generation Parameters

from ocrxdoc import OCREngine

engine = OCREngine(
    model_size="4B",
    max_tokens=5000,
    temperature=0.1,
    top_p=0.9
)
engine.load_model()

# Or update after initialization
engine.set_generation_params(
    max_tokens=5000,
    temperature=0.1
)

Model Setup

Models need to be downloaded manually due to their large size:

  1. 4B Model (Default):

  2. 2B Model:

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • Transformers 4.57+
  • Pillow 10.0+
  • For PDF: pdf2image and Poppler
  • For DOCX: python-docx

System Requirements

  • RAM: Minimum 16GB (recommended 32GB+)
  • GPU: Recommended (NVIDIA with CUDA support) - VRAM minimum 8GB
  • Paging File: Minimum 8GB for 4B model, 4GB for 2B model

API Reference

OCREngine

Main OCR engine class.

__init__(model_path=None, model_size="4B", device="auto", dtype=None, poppler_path=None, max_tokens=3000, temperature=0.2, top_p=0.8, top_k=50, repetition_penalty=1.1)

Initialize OCR engine.

load_model()

Load the OCR model and processor.

ocr(file_path, prompt="...", roi=None)

Perform OCR on a file.

  • file_path: Path to file
  • prompt: Prompt for OCR model
  • roi: Optional region of interest as (x, y, width, height)

Returns: Extracted text string

ocr_batch(file_paths, prompt="...", progress_callback=None)

Perform OCR on multiple files.

  • file_paths: List of file paths
  • prompt: Prompt for OCR model
  • progress_callback: Optional callback(current, total, filename)

Returns: List of tuples (file_path, ocr_result)

set_generation_params(max_tokens=None, temperature=None, top_p=None, top_k=None, repetition_penalty=None)

Update generation parameters.

cleanup()

Clean up temporary files.

Examples

See examples/ directory for more examples.

License

MIT License

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrxdoc-1.0.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ocrxdoc-1.0.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file ocrxdoc-1.0.0.tar.gz.

File metadata

  • Download URL: ocrxdoc-1.0.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ocrxdoc-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d81c7c6cc739cef119960367a0704a87af480db129948b420c3e951a0dc02c6f
MD5 9aed3a73350df72091a7195852989a6b
BLAKE2b-256 af91cdf65f30e35b8732ff8f6c6246275f38788d041a60e898a127468e293f12

See more details on using hashes here.

File details

Details for the file ocrxdoc-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ocrxdoc-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ocrxdoc-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f39007f871c16ff584080738b70867ecbc5c58f71e7f77b1d943102bd9af3bf
MD5 9f8a935a5c1db4d0493a3f18c19f525c
BLAKE2b-256 b3d9d36b038821b5ffe7bf32fd1ead981f6bb6ed1b7be0def38fe1f151cc900e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page