OpenHydra decentralised P2P inference network

Project description

OpenHydra

Turn on. Tune in. Drop in.

Your laptop is a supercomputer waiting to happen.

OpenHydra is a peer-to-peer inference network that turns idle hardware into a global AI swarm. Any Mac, NVIDIA GPU, or AMD GPU can join. No central server. No API keys. No $20/month subscription. Just open the app and you're contributing compute.

The moment you open OpenHydra, you join the base swarm running Qwen 3.5 0.8B — just 2 GB of RAM. Everyone gets an instant green light. If your hardware can handle more, the app auto-promotes you to larger models with higher rewards. Zero configuration required.

Why OpenHydra?

One click to join. Open the app. Your hardware auto-joins the swarm. No terminal, no model selection, no VRAM calculations.
No VRAM ceiling. A 70B model that needs 140 GB runs across 8 peers, each contributing 18 GB.
No central server. Every node is both client and server. The network is the computer.
Privacy by default. Onion routing + AES-256-GCM encryption + differential privacy. No peer sees your full query.
Earn while you idle. HYDRA tokens and barter credits for every request your node serves.

Quick Start

Desktop App (recommended)

Download for Mac | Download for Windows

Open the app. It auto-detects your hardware, joins the base swarm (Qwen 3.5 0.8B), and starts earning. If you have heavy hardware, it recommends upgrading to larger models for higher rewards.

CLI Install

pip install openhydra-network
openhydra-node --peer-id my-node

That's it. One command. OpenHydra auto-detects your hardware (Apple Silicon → MLX, NVIDIA → CUDA, AMD → ROCm), joins the global DHT, and starts an OpenAI-compatible API at http://127.0.0.1:8080. The default model is Qwen 3.5 0.8B — lightweight enough for any machine.

Supported platforms:

Platform	Backend	Notes
🍎 Apple Silicon (M1–M4)	MLX (Metal)	Zero-copy unified memory. ~252 tok/s.
🟢 NVIDIA GPU (CUDA)	PyTorch	Any CUDA-capable GPU. NF4 quantization.
🔴 AMD GPU (ROCm)	PyTorch	ROCm 6.2+. Same PyTorch backend.

Chat with your node

# Chat completion
curl -s http://127.0.0.1:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "openhydra-qwen3.5-0.8b",
    "messages": [{"role": "user", "content": "Explain P2P inference in one sentence."}]
  }' | python3 -m json.tool

# Streaming
curl -N http://127.0.0.1:8080/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "openhydra-qwen3.5-0.8b",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Ollama-compatible API

Already using Open WebUI or Continue.dev? OpenHydra speaks Ollama natively:

# Ollama-style generate
curl http://127.0.0.1:8080/api/generate \
  -d '{"model": "openhydra-qwen3.5-0.8b", "prompt": "Why is the sky blue?"}'

# Ollama-style chat
curl http://127.0.0.1:8080/api/chat \
  -d '{"model": "openhydra-qwen3.5-0.8b", "messages": [{"role": "user", "content": "Hello"}]}'

Local development (private DHT, two terminals)

# Terminal 1 - DHT bootstrap
openhydra-dht --host 127.0.0.1 --port 8468

# Terminal 2 - node
openhydra-node --peer-id dev-node \
    --dht-url http://127.0.0.1:8468

Docker (full stack: node + Prometheus + Grafana)

docker compose up
# API:        http://localhost:8080
# Prometheus: http://localhost:9090
# Grafana:    http://localhost:3000  (admin / openhydra)

Architecture

  Client (curl / SDK / Open WebUI / Continue.dev)
      |  POST /v1/chat/completions  OR  /api/chat (Ollama)
      v
  Coordinator (HTTP :8080)
      |  Dual-stack DHT lookup (HTTP + Hivemind Kademlia)
      |  Pipeline assembly (sharded or full-model)
      |  Onion route construction + activation encryption
      v
  Peer Pipeline (gRPC :50051)
      peer-A (layers 0-7)  -->  peer-B (layers 8-15)  -->  peer-C (layers 16-31, emits tokens)
      |                          |                          |
      |  KV compaction          |  NF4 quantization       |  Speculative decode
      |  Radix prefix cache     |  Batched inference       |  Token streaming
      v                          v                          v
  DHT Bootstrap (HTTP :8468) + Hivemind Signposts (libp2p :38751)
      EU: 172.105.69.49  |  US: 45.79.190.172  |  AP: 172.104.164.98

Each node bundles two components in a single process:

Peer — a gRPC inference server that loads one model shard (or a full model on capable hardware) and announces itself to the DHT every 60 seconds.
Coordinator — an OpenAI-compatible HTTP API that discovers peers, assembles inference pipelines, enforces verification, and manages the token economy.

Because every participant runs their own coordinator, there is no central authority. Accessing the network means running a node — the same model as BitTorrent.

Dual-Stack Peer Discovery

OpenHydra runs two DHT protocols simultaneously for maximum resilience:

HTTP DHT (port 8468) — lightweight announce/lookup REST API behind nginx. Sub-millisecond lookups. Used for peer discovery today.
Hivemind Kademlia DHT (port 38751) — production libp2p Kademlia network with persistent peer IDs. Three signpost nodes (EU/US/AP) bootstrap the swarm. Peers auto-join via hardcoded multiaddrs.

Layer Sharding

A 32-layer model can be split across 4 peers, each running 8 layers. The coordinator's LayerCoverageMap detects which layers are available across the swarm and assembles complete pipelines using a greedy O(n*s) algorithm. If a sharded pipeline can't cover all layers, it falls back to a peer running the full model.

# Run only layers 0-7 of a 32-layer model
openhydra-node --peer-id shard-0 \
  --shard-index 0 --total-shards 4 \
  --runtime-backend pytorch_auto \
  --runtime-model-id meta-llama/Llama-3.1-8B

The AppChain Economy

Barter Credits (Tier 1)

Every inference request is settled peer-to-peer in barter credits (1,000 tokens served = 1 credit). Credits decay at 5%/day to prevent hoarding. SQLite WAL-mode ledger, zero external dependencies.

HYDRA Token (Tier 2)

HYDRA is a capped-supply token (69M max) with Burn-and-Mint Equilibrium:

Mechanism	Purpose
Mint on serve	Peers earn HYDRA for inference work
Burn on use	Clients burn HYDRA for priority access
Stake	Staked peers get priority routing
Slash	Failed audits reduce stake
State channels	Off-chain micro-payments (15-min TTL, 8 channels/peer)

Three-Tier Verification

Mystery Shopper (Tier 1) — probabilistic re-execution (10% default sample rate), output comparison
Redundant Execution (Tier 2) — N-peer majority vote
Auditor Spot-check (Tier 3) — independent Bernoulli sampling when primary matches secondary

Verification outcomes feed into a weighted reputation score (40% verification, 25% uptime, 20% latency consistency, 15% stake factor) that determines routing priority.

Security & Privacy

Layer	Mechanism
Identity	Ed25519 keys at `~/.openhydra/identity.key` (mode 0600)
Transport	X25519 ECDH key agreement + AES-256-GCM per activation
Routing	Concentric onion routing — layered encryption through pipeline stages. Each peer peels one layer and forwards the rest.
Privacy	Differential privacy noise injection with verifiable audit tags (HMAC-SHA256)
Sybil resistance	Geo-challenge: SHA-256 proof-of-work bound to claimed region
Systemd hardening	Non-root user, `ProtectSystem=strict`, `PrivateTmp`, `NoNewPrivileges`, iptables rate limiting

Encryption overhead is ~0.15ms per activation (~0.02% of total inference latency). We keep it on by default.

KV Cache Compaction

OpenHydra implements Q-Tensor KV Compaction via Attention Matching, enabling unbounded effective context length on fixed-memory hardware.

Four phases, incrementally composable:

Phase	What it does	Output
Phase 1	HAK (Highest Attention Keys) or OMP (greedy residual pursuit) token selection	Standard HF DynamicCache
Phase 2	Beta bias correction (NNLS) + Cv value refit (least-squares)	CompactedKVCache with per-head beta
Phase 3	Per-layer/per-head token budgets from JSON	Non-uniform compression
Phase 4	Online mid-trajectory compaction when seq_len > threshold	Unbounded context on fixed memory

openhydra-node --kv-compaction-mode auto \
  --kv-compaction-ratio 0.5 \
  --kv-compaction-phase 4

Reference: arXiv:2602.16284

Model Catalog

Graceful degradation built in — if the requested model lacks peers, the coordinator serves the nearest smaller model and reports it via X-OpenHydra-Degradation-Reason.

Tier	Model	HuggingFace ID	VRAM	Peers	Quant	Status
Frontier	Qwen 3.5 27B	`Qwen/Qwen3.5-27B`	16 GB × 4	4	int4	✅ Available
Advanced	Qwen 3.5 9B	`Qwen/Qwen3.5-9B`	18 GB × 2	2	int8	✅ Available
Standard	Qwen 3.5 4B	`Qwen/Qwen3.5-4B`	9 GB	1	int4	✅ Available
Basic	Qwen 3.5 2B	`Qwen/Qwen3.5-2B`	5 GB	1	fp32	✅ Available
Basic	Qwen 3.5 0.8B	`Qwen/Qwen3.5-0.8B`	2 GB	1	fp32	✅ Available

Full catalog: models.catalog.json

REST API

Public (no auth)

Method	Path	Description
`GET`	`/health`	Liveness probe
`GET`	`/readyz`	Readiness probe
`GET`	`/metrics`	Prometheus metrics

OpenAI-compatible

Method	Path	Description
`GET`	`/v1/models`	List available models
`POST`	`/v1/chat/completions`	Chat inference (streaming + non-streaming)
`POST`	`/v1/completions`	Text completion

Ollama-compatible

Method	Path	Description
`POST`	`/api/generate`	Ollama generate
`POST`	`/api/chat`	Ollama chat

Economy & Network

Method	Path	Description
`GET`	`/v1/network/status`	Peer inventory, reputation, economy
`GET`	`/v1/account/balance`	Barter + HYDRA balance
`POST`	`/v1/hydra/stake`	Stake HYDRA
`POST`	`/v1/hydra/channels/open`	Open state channel

Rate Limiting

Every response includes standard rate-limit headers:

X-RateLimit-Limit: 120
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1741474823
Retry-After: 30          # only on 429

Default: 120 requests per 60-second sliding window per IP.

Project Structure

peer/              Inference engine: gRPC server, model shard, MLX/PyTorch runtimes,
                   KV compaction, request coalescing, P2P model cache, batching
coordinator/       HTTP API, pipeline routing, chain failover, speculative decode,
                   auto-scaler, verification, economy, interactive CLI
dht/               HTTP DHT bootstrap + Hivemind Kademlia signpost daemon
economy/           Barter credits + HYDRA token + state channels (SQLite & Postgres)
verification/      Mystery Shopper, redundant execution, auditor spot-checks, reputation
compression/       LZ4 codec + learned tensor autoencoder
grounding/         DuckDuckGo RAG with local cache fallback
sdk/               Python and TypeScript SDK clients
desktop/           Tauri v2 desktop app (Rust + vanilla JS)
ops/               Terraform, Docker Compose, Prometheus/Grafana, TLS, deploy scripts
scripts/           SLO chaos test, KV benchmark, head-budget optimizer, canary rollout
tests/             867 tests (858 unit + 9 real-model integration)

License

OpenHydra uses a dual open-core license:

Component	Directories	License	Rationale
Inference engine	`peer/`, `dht/`	Apache 2.0	Maximum adoption. Use in proprietary products. Patent grant included.
Network services	`coordinator/`, `economy/`, `verification/`, `desktop/`	AGPL v3	Network copyleft. Running a modified coordinator as a hosted API requires publishing source.

This structure ensures cloud providers cannot close-source the orchestration layer while keeping the peer engine maximally permissive for hardware vendors, embedded systems, and proprietary integrations.

Full details: LICENSE

Commercial licensing: sam@openhydra.co

Contributing

See CONTRIBUTING.md for development setup, testing, and commit conventions.

Good first issues: good first issue

Security

See SECURITY.md for the vulnerability disclosure policy.

Do not open public issues for security vulnerabilities. Email sam@openhydra.co or use GitHub Security Advisories.

Acknowledgements & Prior Art

OpenHydra stands on the shoulders of remarkable projects and research:

Petals (BigScience) — proved that pipeline-parallel LLM inference over the public internet is viable. Their work on collaborative serving of BLOOM-176B demonstrated that volunteer hardware can collectively run frontier models. OpenHydra's layer sharding pipeline is philosophically descended from Petals.
Exo — demonstrated MLX tensor parallelism for local network clusters and pioneered shard-aware model downloads. Their focus on Apple Silicon performance informed our MLX runtime design.
Apple MLX — the MLX framework and its unified memory architecture make zero-copy inference on Apple Silicon practical. Our MLX runtime achieves ~252 tok/s thanks to their work.
Hivemind — production-grade decentralised training and DHT infrastructure. OpenHydra's Kademlia signpost layer uses hivemind's libp2p DHT implementation.
Kademlia DHT (Maymounkov & Mazieres, 2002) — the distributed hash table protocol that underpins peer discovery in BitTorrent, IPFS, and now OpenHydra.
KV Cache Compaction (arXiv:2602.16284) — the attention matching framework that inspired our Q-Tensor compaction pipeline.
Speculative Decoding (Leviathan et al., 2023; Chen et al., 2023) — the draft-then-verify paradigm that OpenHydra extends to distributed multi-peer pipelines.

Roadmap

Status	Milestone
Done	Core inference, DHT, TLS, three-tier verification, barter credits, HYDRA token
Done	Ed25519 identity, PostgreSQL economy, Terraform IaC, Grafana dashboards
Done	KV cache compaction (4 phases + Option A query capture)
Done	MLX backend, NF4 quantization, request coalescing, P2P model cache
Done	Layer sharding, auto-scaler, Hivemind Kademlia DHT, Ollama API
Done	Open-core licensing, rate-limit headers, CI, documentation
Next	On-chain DAO (Solidity state-channel contract on Arbitrum/Base)
Next	SDK v1 (Python + TypeScript, streaming, retry, type-safe)
Next	P2P agentic swarms (agent sessions, tool execution, MCP)
Next	Full documentation site (MkDocs Material + API reference)

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openhydra_network-0.1.0.tar.gz (343.5 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openhydra_network-0.1.0-py3-none-any.whl (249.7 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file openhydra_network-0.1.0.tar.gz.

File metadata

Download URL: openhydra_network-0.1.0.tar.gz
Upload date: Mar 19, 2026
Size: 343.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for openhydra_network-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`95d34577b4909fd8cedae4622c3bf5b84c6cafed3650ed2373e0522a0a2ea607`
MD5	`bdd4af0c92db95027aa2fe21554f52d9`
BLAKE2b-256	`03f2b97b058d52b9aaa7ba9d89664c2fede7b0d296457a7f25a31fa42a715c25`

See more details on using hashes here.

File details

Details for the file openhydra_network-0.1.0-py3-none-any.whl.

File metadata

Download URL: openhydra_network-0.1.0-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 249.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for openhydra_network-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1943bfb2b73f2ebc177d01fae304633d95c3f67fd10dc0fd0651073e706ad6d1`
MD5	`cbee6bc63958fb0ee5f2badbd735ffd4`
BLAKE2b-256	`9bee7b6838b591b3cf5833977e01f4b622f3e2b29e8adf9663b1e8fc1653c7c7`

See more details on using hashes here.

openhydra-network 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

OpenHydra

Quick Start

Desktop App (recommended)

CLI Install

Chat with your node

Ollama-compatible API

Local development (private DHT, two terminals)

Docker (full stack: node + Prometheus + Grafana)

Architecture

Dual-Stack Peer Discovery

Layer Sharding

The AppChain Economy

Barter Credits (Tier 1)

HYDRA Token (Tier 2)

Three-Tier Verification

Security & Privacy

KV Cache Compaction

Model Catalog

REST API

Public (no auth)

OpenAI-compatible

Ollama-compatible

Economy & Network

Rate Limiting

Project Structure

License

Contributing

Security

Acknowledgements & Prior Art

Roadmap

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes