A high-performance, memory-efficient inference server for diffusion models, compatible with the OpenAI client

These details have not been verified by PyPI

Project links

Project description

Aquiles-Image

Self-hosted image/video generation with OpenAI-compatible APIs

🚀 FastAPI • Diffusers • Drop-in replacement for OpenAI

[ English | Español ]

🎯 What is Aquiles-Image?

Aquiles-Image is a production-ready API server that lets you run state-of-the-art image and video generation models on your own infrastructure. OpenAI-compatible by design, switch from external APIs to self-hosted in under 5 minutes with zero code changes.

External image APIs are expensive, slow, and send your data to third parties. Aquiles-Image runs on your hardware, costs nothing per request, and works with the OpenAI client you already use.

Why Aquiles-Image?

Challenge	Aquiles-Image Solution
💸 Expensive external APIs	Run models locally with unlimited usage
🔒 Data privacy concerns	Your images never leave your server
🐌 Slow inference	Advanced optimizations for 3x faster generation
🔧 Complex setup	One command to run any supported model
🚫 Vendor lock-in	OpenAI-compatible, switch without rewriting code

Key Features

🔌 OpenAI Compatible - Use the official OpenAI client with zero code changes
⚡ Intelligent Batching - Automatic request grouping by shared parameters for maximum throughput on single or multi-GPU setups
🎨 30+ Optimized Models - 18 image (FLUX, SD3.5, Qwen) + 12 video models (Wan2.x, HunyuanVideo) + unlimited via AutoPipeline (Only T2I)
🚀 Multi-GPU Support - Distributed inference with dynamic load balancing across GPUs (image models) for horizontal scaling
🛠️ Superior DevX - Simple CLI, dev mode for testing, built-in monitoring
🎬 Advanced Video - Text-to-video with Wan2.x and HunyuanVideo series (+ Turbo variants)
🧩 LoRA Support - Load any LoRA from HuggingFace or a local path via a simple JSON config file, compatible with all native models and AutoPipeline

🚀 Quick Start

Installation

# From PyPI (recommended)
pip install aquiles-image

# From source
git clone https://github.com/Aquiles-ai/Aquiles-Image.git
cd Aquiles-Image
pip install .

Launch Server

Single-Device Mode (Default)

aquiles-image serve --model "stabilityai/stable-diffusion-3.5-medium"

Multi-GPU Distributed Mode (Image Models Only)

aquiles-image serve --model "stabilityai/stable-diffusion-3.5-medium" --dist-inference

Distributed Inference Note: Enable multi-GPU mode by adding the --dist-inference flag. Each GPU will load a copy of the model, so ensure each GPU has sufficient VRAM. The system automatically balances load across GPUs and groups requests with shared parameters for maximum throughput.

Generate Your First Image

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:5500", api_key="not-needed")

result = client.images.generate(
    model="stabilityai/stable-diffusion-3.5-medium",
    prompt="a white siamese cat",
    size="1024x1024"
)

print(f"Image URL: {result.data[0].url}")

That's it! You're now generating images with the same API you'd use for OpenAI.

🎨 Supported Models

Text-to-Image (`/images/generations`)

stabilityai/stable-diffusion-3-medium
stabilityai/stable-diffusion-3.5-medium
stabilityai/stable-diffusion-3.5-large
stabilityai/stable-diffusion-3.5-large-turbo
black-forest-labs/FLUX.1-dev
black-forest-labs/FLUX.1-schnell
black-forest-labs/FLUX.1-Krea-dev
black-forest-labs/FLUX.2-dev *
diffusers/FLUX.2-dev-bnb-4bit
Tongyi-MAI/Z-Image-Turbo
Qwen/Qwen-Image
Qwen/Qwen-Image-2512
black-forest-labs/FLUX.2-klein-4B
black-forest-labs/FLUX.2-klein-9B
zai-org/GLM-Image - (This model is usually the slowest to execute in relative terms)
Tongyi-MAI/Z-Image
black-forest-labs/FLUX.2-klein-9b-kv
NucleusAI/Nucleus-Image
baidu/ERNIE-Image
baidu/ERNIE-Image-Turbo

Image-to-Image (`/images/edits`)

black-forest-labs/FLUX.1-Kontext-dev
diffusers/FLUX.2-dev-bnb-4bit - Supports multi-image editing. Maximum 10 input images.
black-forest-labs/FLUX.2-dev * - Supports multi-image editing. Maximum 10 input images.
Qwen/Qwen-Image-Edit
Qwen/Qwen-Image-Edit-2509 - Supports multi-image editing. Maximum 3 input images.
Qwen/Qwen-Image-Edit-2511 - Supports multi-image editing. Maximum 3 input images.
black-forest-labs/FLUX.2-klein-4B - Supports multi-image editing. Maximum 10 input images.
black-forest-labs/FLUX.2-klein-9B - Supports multi-image editing. Maximum 10 input images.
black-forest-labs/FLUX.2-klein-9b-kv - Supports multi-image editing. Maximum 10 input images.
zai-org/GLM-Image - Supports multi-image editing. Maximum 5 input images. (This model is usually the slowest to execute in relative terms)

* Note on FLUX.2-dev: Requires NVIDIA H200.

Text-to-Video and Image-to-Video (Only LTX-2/LTX-2.3 accept T2V and I2V, other models only accept T2V) (`/videos`)

Wan2.2 Series

Wan-AI/Wan2.2-T2V-A14B (High quality, 40 steps - start with --model "wan2.2")
Aquiles-ai/Wan2.2-Turbo ⚡ 9.5x faster - Same quality in 4 steps! (start with --model "wan2.2-turbo")

Wan2.1 Series

Wan-AI/Wan2.1-T2V-14B (High quality, 40 steps - start with --model "wan2.1")
Aquiles-ai/Wan2.1-Turbo ⚡ 9.5x faster - Same quality in 4 steps! (start with --model "wan2.1-turbo")
Wan-AI/Wan2.1-T2V-1.3B (Lightweight version, 40 steps - start with --model "wan2.1-3B")
Aquiles-ai/Wan2.1-Turbo-fp8 ⚡ 9.5x faster + FP8 optimized - 4 steps (start with --model "wan2.1-turbo-fp8")

HunyuanVideo-1.5 Series

Standard Resolution (480p)

Aquiles-ai/HunyuanVideo-1.5-480p (50 steps - start with --model "hunyuanVideo-1.5-480p")
Aquiles-ai/HunyuanVideo-1.5-480p-fp8 (50 steps, FP8 optimized - start with --model "hunyuanVideo-1.5-480p-fp8")
Aquiles-ai/HunyuanVideo-1.5-480p-Turbo ⚡ 12.5x faster - 4 steps! (start with --model "hunyuanVideo-1.5-480p-turbo")
Aquiles-ai/HunyuanVideo-1.5-480p-Turbo-fp8 ⚡ 12.5x faster + FP8 optimized - 4 steps (start with --model "hunyuanVideo-1.5-480p-turbo-fp8")

High Resolution (720p)

Aquiles-ai/HunyuanVideo-1.5-720p (50 steps - start with --model "hunyuanVideo-1.5-720p")
Aquiles-ai/HunyuanVideo-1.5-720p-fp8 (50 steps, FP8 optimized - start with --model "hunyuanVideo-1.5-720p-fp8")

LTX-2/LTX-2.3 (Joint Audio-Visual Generation)

Lightricks/LTX-2 (40 steps - start with --model "ltx-2")
Lightricks/LTX-2.3 (40 steps - start with --model "ltx-2.3")

Special Features: LTX-2/LTX-2.3 are the first open-sources models supporting synchronized audio-video generation in a single model, comparable to closed models like Sora-2 and Veo 3.1. Additionally, LTX-2 supports image input as the first frame of the video - pass a reference image via input_reference to guide the visual starting point of the generation. For best results with this model, please follow the prompts guide provided by the Lightricks team.

Image-to-Video example:

curl -X POST "https://YOUR_BASE_URL_DEPLOY/videos" \
  -H "Authorization: Bearer dummy-api-key" \
  -H "Content-Type: multipart/form-data" \
  -F prompt="She turns around and smiles, then slowly walks out of the frame." \
  -F model="ltx-2" \
  -F size="1280x720" \
  -F seconds="8" \
  -F input_reference="@sample_720p.jpeg;type=image/jpeg"

VRAM Requirements: Most models need 24GB+ VRAM. All video models require H100/A100-80GB. FP8 optimized versions offer better memory efficiency.

📖 Full models documentation and more models in 🎬 Aquiles-Studio

🔍 Can't find the model you're looking for?

If the model you need isn't in our native list, you can still run virtually any architecture based on Diffusers (SD 1.5, SDXL, etc.) using our AutoPipeline implementation.

Check out the 🧪 Advanced Features section to learn how to deploy any Hugging Face model with a single command.

💡 Examples

Generating Images

https://github.com/user-attachments/assets/00e18988-0472-4171-8716-dc81b53dcafa

https://github.com/user-attachments/assets/00d4235c-e49c-435e-a71a-72c36040a8d7

Editing Images

Input + Prompt	Result

Generating Videos

https://github.com/user-attachments/assets/7b1270c3-b77b-48df-a0fe-ac39b2320143

Note: Video generation with wan2.2 takes ~30 minutes on H100. With wan2.2-turbo, it takes only ~3 minutes! Only one video can be generated at a time.

Video and audio generation

https://github.com/user-attachments/assets/b7104dc3-5306-4e6a-97e5-93a6c1e73f54

Beyond the output examples shown above, you can check the Example folder where you'll find examples of how to deploy Aquiles-Image with Modal.

🧪 Advanced Features

AutoPipeline - Run Any Diffusers Model

Run any model compatible with AutoPipelineForText2Image or AutoPipelineForImage2Image from HuggingFace:

aquiles-image serve \
  --model "stabilityai/stable-diffusion-xl-base-1.0" \
  --auto-pipeline \
  --set-steps 30 \
  --auto-pipeline-type t2i # or i2i for Image to Image

Supported models include:

stable-diffusion-v1-5/stable-diffusion-v1-5
stabilityai/stable-diffusion-xl-base-1.0
Any HuggingFace model compatible with AutoPipelineForText2Image or AutoPipelineForImage2Image

Trade-offs:

⚠️ Slower inference than native implementations
⚠️ Experimental - may have stability issues

LoRA Support

Load any LoRA from HuggingFace or a local path by passing a JSON config file at startup. Compatible with all native image models and AutoPipeline.

1. Create a LoRA config file:

Manually:

{
  "repo_id": "brushpenbob/Flux-retro-Disney-v2",
  "weight_name": "Flux_retro_Disney_v2.safetensors",
  "adapter_name": "flux-retro-disney-v2",
  "scale": 1.0
}

Or programmatically using the Python helper:

from aquilesimage.utils import save_lora_config
from aquilesimage.models import LoRAConfig

save_lora_config(
    LoRAConfig(
        repo_id="brushpenbob/Flux-retro-Disney-v2",
        weight_name="Flux_retro_Disney_v2.safetensors",
        adapter_name="flux-retro-disney-v2"
    ),
    "./lora_config.json"
)

2. Start the server with LoRA enabled:

aquiles-image serve \
  --model "black-forest-labs/FLUX.1-dev" \
  --load-lora \
  --lora-config "./lora_config.json"

Works in both single-device and distributed mode:

aquiles-image serve \
  --model "black-forest-labs/FLUX.1-dev" \
  --load-lora \
  --lora-config "./lora_config.json" \
  --dist-inference

Dev Mode - Test Without Loading Models

Perfect for development, testing, and CI/CD:

aquiles-image serve --no-load-model

What it does:

Starts server instantly without GPU
Returns test images that simulate real responses
All endpoints functional with realistic formats
Same API structure as production

API Key Protection & Playground

Securing Your Server with an API Key

You can protect your server by requiring an API key on every request. Simply pass --api-key when starting the server:

aquiles-image serve --model "stabilityai/stable-diffusion-3.5-medium" --api-key "your-api-key"

All requests must then include the key in the Authorization header:

curl -X POST "http://localhost:5500/images/generations" \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "stabilityai/stable-diffusion-3.5-medium", "prompt": "a white siamese cat"}'

Built-in Playground

Aquiles-Image ships with a built-in interactive playground for testing image models and monitoring server stats — protected by login to prevent unauthorized access. Enable it with --username and --password:

aquiles-image serve --model "stabilityai/stable-diffusion-3.5-medium" \
  --api-key "your-api-key" \
  --username "root" \
  --password "root"

Once running, open http://localhost:5500 in your browser. The playground lets you:

Generate images interactively using any loaded image model
Visualize server stats in real time

Note: The playground is only available for image models.

Login

Playground

📊 Monitoring & Stats

`/health` - Server Health Check

A public endpoint (no API key required) designed for orchestrators like Kubernetes, Docker, Modal, etc.

Returns 200 OK when the server is ready to accept requests
Returns 503 Service Unavailable while the model is still loading

curl http://localhost:5500/health

{
  "status": "ok",
  "model": "black-forest-labs/FLUX.1-dev",
  "mode": "single-device",
  "timestamp": 1745623410,
  "devices": [
    {
      "id": "cuda:0",
      "name": "NVIDIA H100 80GB",
      "vram_total_gb": 79.2,
      "vram_free_gb": 51.4
    }
  ]
}

`/stats` - Real-Time Inference Metrics

Aquiles-Image provides a custom /stats endpoint for real-time monitoring:

import requests

# Get server statistics
stats = requests.get("http://localhost:5500/stats", 
                    headers={"Authorization": "Bearer YOUR_API_KEY"}).json()

print(f"Total requests: {stats['total_requests']}")
print(f"Total images generated: {stats['total_images']}")
print(f"Queued: {stats['queued']}")
print(f"Completed: {stats['completed']}")

Response Formats

The response varies depending on the model type and configuration:

Image Models - Single-Device Mode

{
  "mode": "single-device",
  "total_requests": 150,
  "total_batches": 42,
  "total_images": 180,
  "queued": 3,
  "completed": 147,
  "failed": 0,
  "processing": true,
  "available": false
}

Image Models - Distributed Mode (Multi-GPU)

{
  "mode": "distributed",
  "devices": {
    "cuda:0": {
      "id": "cuda:0",
      "available": true,
      "processing": false,
      "can_accept_batch": true,
      "batch_size": 4,
      "max_batch_size": 8,
      "images_processing": 0,
      "images_completed": 45,
      "total_batches_processed": 12,
      "avg_batch_time": 2.5,
      "estimated_load": 0.3,
      "error_count": 0,
      "last_error": null
    },
    "cuda:1": {
      "id": "cuda:1",
      "available": true,
      "processing": true,
      "can_accept_batch": false,
      "batch_size": 2,
      "max_batch_size": 8,
      "images_processing": 2,
      "images_completed": 38,
      "total_batches_processed": 10,
      "avg_batch_time": 2.8,
      "estimated_load": 0.7,
      "error_count": 0,
      "last_error": null
    }
  },
  "global": {
    "total_requests": 150,
    "total_batches": 42,
    "total_images": 180,
    "queued": 3,
    "active_batches": 1,
    "completed": 147,
    "failed": 0,
    "processing": true
  }
}

Video Models

{
  "total_tasks": 25,
  "queued": 2,
  "processing": 1,
  "completed": 20,
  "failed": 2,
  "available": false,
  "max_concurrent": 1
}

Key Metrics:

total_requests/tasks - Total number of generation requests received
total_images - Total images generated (image models only)
queued - Requests waiting to be processed
processing - Currently processing requests
completed - Successfully completed requests
failed - Failed requests
available - Whether server can accept new requests
mode - Operation mode for image models: single-device or distributed

🎯 Use Cases

Who	What
🚀 AI Startups	Build image generation features without API costs
👨‍💻 Developers	Prototype with multiple models using one interface
🔬 Researchers	Experiment with cutting-edge models easily
🏢 Enterprises	Need a full private AI platform beyond image generation? Check out Ishikawa, deploy chat, agents, and multimodal AI entirely on your infrastructure.

📋 Prerequisites

Python 3.8+
CUDA-compatible GPU with 24GB+ VRAM (most models)
10GB+ free disk space

📚 Documentation

🤝 Contributing

We welcome contributions! Whether you want to:

🐛 Report bugs and issues
🎨 Add support for new image models
📝 Improve documentation

Please read our Contributing Guide to get started.

⭐ Star this project • 🐛 Report issues • 🤝 Contribute

Built with ❤️ for the AI community, as part of the Aquiles-ai open source ecosystem.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.6.7

Apr 30, 2026

0.6.6

Apr 25, 2026

0.6.5

Apr 18, 2026

0.6.2

Apr 16, 2026

0.6.1

Apr 16, 2026

0.6.0

Mar 30, 2026

0.5.6

Mar 21, 2026

0.5.5

Mar 7, 2026

0.5.3

Mar 7, 2026

0.5.2

Mar 4, 2026

0.5.1

Mar 3, 2026

0.5.0

Mar 1, 2026

0.4.9

Feb 28, 2026

0.4.8

Feb 23, 2026

0.4.5

Feb 10, 2026

0.4.4

Feb 1, 2026

0.4.2

Jan 31, 2026

0.4.0

Jan 19, 2026

0.3.7

Jan 12, 2026

0.3.6

Jan 10, 2026

0.3.5

Jan 9, 2026

0.3.2

Jan 7, 2026

0.3.0

Jan 4, 2026

0.2.85

Dec 31, 2025

0.2.84

Dec 28, 2025

0.2.82

Dec 27, 2025

0.2.80

Dec 25, 2025

0.2.75

Dec 24, 2025

0.2.74

Dec 24, 2025

0.2.73

Dec 24, 2025

0.2.72

Dec 24, 2025

0.2.71

Dec 24, 2025

0.2.8

Dec 25, 2025

0.2.7

Dec 24, 2025

0.2.5

Dec 15, 2025

0.2.0

Dec 10, 2025

0.1.92

Dec 9, 2025

0.1.91

Dec 7, 2025

0.1.90

Dec 6, 2025

0.1.89

Dec 4, 2025

0.1.88

Dec 1, 2025

0.1.87

Dec 1, 2025

0.1.86

Nov 30, 2025

0.1.85

Nov 9, 2025

0.1.8

Nov 7, 2025

0.1.0

Sep 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aquiles_image-0.6.7.tar.gz (83.0 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aquiles_image-0.6.7-py3-none-any.whl (107.4 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file aquiles_image-0.6.7.tar.gz.

File metadata

Download URL: aquiles_image-0.6.7.tar.gz
Upload date: Apr 30, 2026
Size: 83.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for aquiles_image-0.6.7.tar.gz
Algorithm	Hash digest
SHA256	`99b971fb1e801cee10e3d82601a77f346e0a25b48d776978f4ea1f3fb856fcaa`
MD5	`9363b7a933513a2253c42a635f60ace8`
BLAKE2b-256	`1dfd71193bb691c55ad19ed549c4e94185ec0e8c1e053c3b246e8dedf51af1aa`

See more details on using hashes here.

File details

Details for the file aquiles_image-0.6.7-py3-none-any.whl.

File metadata

Download URL: aquiles_image-0.6.7-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 107.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for aquiles_image-0.6.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8cf819ab12cbe8162afb5b14ef0b124bbf78a8dae9f2f756b8a6a928a26b6286`
MD5	`dc52187d5df3ea183a89304768eea821`
BLAKE2b-256	`1baf5ded64bb245bd3de743a38adceb93b86d852d6d91537a1f2bddaf3201cbc`

See more details on using hashes here.

aquiles-image 0.6.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Aquiles-Image

Self-hosted image/video generation with OpenAI-compatible APIs

🎯 What is Aquiles-Image?

Why Aquiles-Image?

Key Features

🚀 Quick Start

Installation

Launch Server

Generate Your First Image

🎨 Supported Models

Text-to-Image (/images/generations)

Image-to-Image (/images/edits)

Text-to-Video and Image-to-Video (Only LTX-2/LTX-2.3 accept T2V and I2V, other models only accept T2V) (/videos)

Wan2.2 Series

Wan2.1 Series

HunyuanVideo-1.5 Series

LTX-2/LTX-2.3 (Joint Audio-Visual Generation)

🔍 Can't find the model you're looking for?

💡 Examples

Generating Images

Editing Images

Generating Videos

🧪 Advanced Features

AutoPipeline - Run Any Diffusers Model

LoRA Support

Dev Mode - Test Without Loading Models

API Key Protection & Playground

Securing Your Server with an API Key

Built-in Playground

📊 Monitoring & Stats

/health - Server Health Check

/stats - Real-Time Inference Metrics

Response Formats

Image Models - Single-Device Mode

Image Models - Distributed Mode (Multi-GPU)

Video Models

🎯 Use Cases

📋 Prerequisites

📚 Documentation

🤝 Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Text-to-Image (`/images/generations`)

Image-to-Image (`/images/edits`)

Text-to-Video and Image-to-Video (Only LTX-2/LTX-2.3 accept T2V and I2V, other models only accept T2V) (`/videos`)

`/health` - Server Health Check

`/stats` - Real-Time Inference Metrics