Physics-based verification of AI-designed protein structures
Project description
ProtQC
Physics-based verification of AI-designed protein structures
Catches structural hallucinations before wet-lab
Why ProtQC?
AI protein design tools (AlphaFold, RFdiffusion, ProteinMPNN, BoltzGen) routinely produce structures with high confidence scores (pLDDT > 90) that still fail experimentally. A protein can look perfect by pLDDT yet harbor internal voids, unstable hydrogen bond networks, or thermodynamic instabilities that only surface in solution.
ProtQC combines six physics-based metrics into a composite risk score, catching high-pLDDT hallucinations that no single metric detects on its own.
Quick Start
protqc analyze protein.pdb
The 6 Metrics
| # | Metric | Source | What It Catches |
|---|---|---|---|
| 1 | pLDDT | Structure prediction | Low confidence regions |
| 2 | MD RMSD | OpenMM | Backbone instability under simulation |
| 3 | Cavity Volume | fpocket | Internal voids and packing defects |
| 4 | H-bond Persistence | MDTraj | Weak hydrogen bond networks |
| 5 | SS Preservation | MDTraj DSSP | Secondary structure loss during MD |
| 6 | SASA Polar Ratio | FreeSASA | Abnormal surface accessibility |
Each metric produces a normalized 0–1 sub-score. The composite risk score is a weighted sum, mapped to a verdict:
- PASS (risk < 0.30) — Design is physically plausible
- WARNING (0.30 ≤ risk < 0.50) — Proceed with caution; review flagged metrics
- FAIL (risk ≥ 0.50) — Design has significant structural issues
Risk Scoring Weights
risk_weights:
plddt: 0.12
md_rmsd: 0.29
cavity: 0.12
hbond_persistence: 0.24
ss_preservation: 0.18
sasa_ratio: 0.05
Validated Results
| Protein | Verdict | Risk Score |
|---|---|---|
| Ubiquitin (1UBQ) | PASS | 0.257 |
| GFP (1EMA) | PASS | 0.281 |
| Alpha-synuclein (1XQ8) | FAIL | 0.555 |
Performance
| Protein | MD Duration | Wall Time | GPU |
|---|---|---|---|
| Ubiquitin (76 aa) | 10 ns | ~23 min | RTX 4070 |
| GFP (238 aa) | 10 ns | ~49 min | RTX 4070 |
Usage
ProtQC provides three usage modes:
CLI — Single Protein Analysis
# Analyze a PDB file
protqc analyze protein.pdb
# Enter a PDB ID — auto-downloads from RCSB
protqc analyze 1UBQ
# Skip MD simulation for quick structural checks
protqc analyze protein.pdb --skip-md
# Set MD simulation length
protqc analyze protein.pdb --md-duration 10
# Use pre-computed MD trajectory
protqc analyze protein.pdb --trajectory md_output.csv
# Generate FastQC-style HTML report
protqc analyze protein.pdb --html report.html
# JSON output
protqc analyze protein.pdb --format json
Interactive Mode
# Launch interactive prompt — guides you through analysis
protqc
AI Chat Assistant
# Start AI-powered chat for interpreting results
protqc chat
Chat supports 8 providers via LiteLLM: OpenAI, Anthropic, Google, DeepSeek, OpenRouter, Moonshot, MiniMax, Zhipu.
Installation
PyPI + conda (recommended)
conda create -n protqc python=3.11
conda activate protqc
# OpenMM and fpocket from conda-forge (not available on PyPI)
conda install -c conda-forge openmm fpocket
# Install ProtQC with all dependencies
pip install protqc
# Analyze a protein
protqc analyze protein.pdb
Source install (development)
git clone https://github.com/korayguzel/protqc.git
cd protqc
conda create -n protqc python=3.11
conda activate protqc
conda install -c conda-forge openmm fpocket
pip install -e '.[all,dev]'
Docker (all platforms)
Docker bundles all dependencies (OpenMM, CUDA, fpocket, FreeSASA, MDTraj):
# Build the image
docker build -t protqc .
# Analyze a protein (GPU-accelerated)
docker run --gpus all -v $(pwd)/data:/app/data protqc analyze data/benchmark/ubiquitin.pdb
# Run with MD simulation
docker run --gpus all -v $(pwd)/data:/app/data protqc analyze data/benchmark/ubiquitin.pdb --md-duration 10
# CPU-only (MD will be slow)
docker run -v $(pwd)/data:/app/data -e CUDA_VISIBLE_DEVICES="" protqc analyze protein.pdb --skip-md
Docker Compose:
# GPU-accelerated
docker compose run protqc analyze data/benchmark/ubiquitin.pdb
# CPU-only variant
docker compose run protqc-cpu analyze data/benchmark/ubiquitin.pdb --skip-md
Note: GPU support requires the NVIDIA Container Toolkit. Without a GPU, MD simulations still work but are significantly slower (~10–50x). Use
--skip-mdfor quick checks without MD.
Configuration
All thresholds, weights, and verdict boundaries are defined in configs/thresholds.yaml. Key tunables:
- Intrinsically disordered proteins: Increase
physics_verifier.md_rmsd_max_angstrom(e.g., 8.0–10.0) since higher RMSD is expected - Membrane proteins: Adjust
surface.sasa_polar_ratio_min/maxfor transmembrane segments
Limitations
ProtQC is a rapid pre-screening tool, not a substitute for comprehensive computational or experimental validation:
-
MD simulation length. The default 10 ns simulation is a rapid pre-screen that catches catastrophic failures (large RMSD drift, complete unfolding). Subtle instabilities — slow conformational changes, partial unfolding events, aggregation-prone intermediates — may require 100–500 ns simulations for reliable detection (Lindorff-Larsen et al. 2011; Ferruz et al. 2022). Treat a ProtQC PASS as "no obvious red flags," not "experimentally validated."
-
Cavity detection. fpocket was designed for identifying druggable surface binding pockets, not for internal void quality control (Le Guilloux et al. 2009). The suspicious cavity flagging (volume > 800 A^3, druggability < 0.4) is a literature-informed heuristic (Schmidtke et al. 2010), not a validated structural defect detector. Combine with packing density metrics or Voronoi-based tools for higher confidence.
-
Risk score weights. The current weights are expert estimates based on published benchmarks (Dauparas et al. 2022; Ferruz et al. 2022) and will be refined through calibration on larger, more diverse protein sets. Different protein families (membrane proteins, IDPs, repeat proteins) may need substantially different weight profiles.
Related Tools
| Tool | Focus |
|---|---|
| CHAPERONg | Automated GROMACS MD analysis |
| MolProbity | Stereochemistry validation |
| QMEAN | Statistical potential scoring |
| VoroMQA | Voronoi tessellation quality |
| ProSA | Statistical analysis of protein structures |
| ProteinDJ | AI protein design evaluation |
| BinderFlow | Binder design pipeline |
| OVO | De novo protein design ecosystem |
Roadmap
v0.2.0 — Benchmark dataset (25 proteins, Garcia/Hermosilla/Chevalier), Colab MCP integration, weight calibration, replica runs
v0.3.0 — Thermal stability prediction, MultiQC-style batch reports, Nextflow/Snakemake templates, REST API
License
MIT
Citation
Güzel, Ö.K. (2026). ProtQC: Physics-based verification of AI-designed protein designs.
github.com/korayguzel/protqc
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protqc-0.1.1.tar.gz.
File metadata
- Download URL: protqc-0.1.1.tar.gz
- Upload date:
- Size: 68.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e326f163c1acbc6de01df64d77d6397dee9576cf88df85a8b8bfd727d7d2abdc
|
|
| MD5 |
e8cbb328147729ab99096912897b2abf
|
|
| BLAKE2b-256 |
fb6c5274c3962152b85a7b3b37a3e5351c7f5a071bab0b16c1d7d83df04d73d5
|
File details
Details for the file protqc-0.1.1-py3-none-any.whl.
File metadata
- Download URL: protqc-0.1.1-py3-none-any.whl
- Upload date:
- Size: 55.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ced574892d684e7dd89035195a68f3b19e0ea768f7117c33023ff13b75440035
|
|
| MD5 |
986ec5833a614ec20b83c3380a68196b
|
|
| BLAKE2b-256 |
f9b28c3f4ad6ad520cedd39d4061ab9c73826a932396557f461a0ca5d1608d50
|