Skip to main content

Physics-based verification of AI-designed protein structures

Project description

ProtQC

Physics-based verification of AI-designed protein structures

License: MIT Python 3.11+

Catches structural hallucinations before wet-lab


Why ProtQC?

AI protein design tools (AlphaFold, RFdiffusion, ProteinMPNN, BoltzGen) routinely produce structures with high confidence scores (pLDDT > 90) that still fail experimentally. A protein can look perfect by pLDDT yet harbor internal voids, unstable hydrogen bond networks, or thermodynamic instabilities that only surface in solution.

ProtQC combines six physics-based metrics into a composite risk score, catching high-pLDDT hallucinations that no single metric detects on its own.

Quick Start

protqc analyze protein.pdb

The 6 Metrics

# Metric Source What It Catches
1 pLDDT Structure prediction Low confidence regions
2 MD RMSD OpenMM Backbone instability under simulation
3 Cavity Volume fpocket Internal voids and packing defects
4 H-bond Persistence MDTraj Weak hydrogen bond networks
5 SS Preservation MDTraj DSSP Secondary structure loss during MD
6 SASA Polar Ratio FreeSASA Abnormal surface accessibility

Each metric produces a normalized 0–1 sub-score. The composite risk score is a weighted sum, mapped to a verdict:

  • PASS (risk < 0.30) — Design is physically plausible
  • WARNING (0.30 ≤ risk < 0.50) — Proceed with caution; review flagged metrics
  • FAIL (risk ≥ 0.50) — Design has significant structural issues

Risk Scoring Weights

risk_weights:
  plddt: 0.12
  md_rmsd: 0.29
  cavity: 0.12
  hbond_persistence: 0.24
  ss_preservation: 0.18
  sasa_ratio: 0.05

Validated Results

Protein Verdict Risk Score
Ubiquitin (1UBQ) PASS 0.257
GFP (1EMA) PASS 0.281
Alpha-synuclein (1XQ8) FAIL 0.555

Performance

Protein MD Duration Wall Time GPU
Ubiquitin (76 aa) 10 ns ~23 min RTX 4070
GFP (238 aa) 10 ns ~49 min RTX 4070

Usage

ProtQC provides three usage modes:

CLI — Single Protein Analysis

# Analyze a PDB file
protqc analyze protein.pdb

# Enter a PDB ID — auto-downloads from RCSB
protqc analyze 1UBQ

# Skip MD simulation for quick structural checks
protqc analyze protein.pdb --skip-md

# Set MD simulation length
protqc analyze protein.pdb --md-duration 10

# Use pre-computed MD trajectory
protqc analyze protein.pdb --trajectory md_output.csv

# Generate FastQC-style HTML report
protqc analyze protein.pdb --html report.html

# JSON output
protqc analyze protein.pdb --format json

Interactive Mode

# Launch interactive prompt — guides you through analysis
protqc

AI Chat Assistant

# Start AI-powered chat for interpreting results
protqc chat

Chat supports 8 providers via LiteLLM: OpenAI, Anthropic, Google, DeepSeek, OpenRouter, Moonshot, MiniMax, Zhipu.

Installation

PyPI + conda (recommended)

conda create -n protqc python=3.11
conda activate protqc

# OpenMM and fpocket from conda-forge (not available on PyPI)
conda install -c conda-forge openmm fpocket

# Install ProtQC with all dependencies
pip install protqc

# Analyze a protein
protqc analyze protein.pdb

Source install (development)

git clone https://github.com/korayguzel/protqc.git
cd protqc
conda create -n protqc python=3.11
conda activate protqc
conda install -c conda-forge openmm fpocket
pip install -e '.[all,dev]'

Docker (all platforms)

Docker bundles all dependencies (OpenMM, CUDA, fpocket, FreeSASA, MDTraj):

# Build the image
docker build -t protqc .

# Analyze a protein (GPU-accelerated)
docker run --gpus all -v $(pwd)/data:/app/data protqc analyze data/benchmark/ubiquitin.pdb

# Run with MD simulation
docker run --gpus all -v $(pwd)/data:/app/data protqc analyze data/benchmark/ubiquitin.pdb --md-duration 10

# CPU-only (MD will be slow)
docker run -v $(pwd)/data:/app/data -e CUDA_VISIBLE_DEVICES="" protqc analyze protein.pdb --skip-md

Docker Compose:

# GPU-accelerated
docker compose run protqc analyze data/benchmark/ubiquitin.pdb

# CPU-only variant
docker compose run protqc-cpu analyze data/benchmark/ubiquitin.pdb --skip-md

Note: GPU support requires the NVIDIA Container Toolkit. Without a GPU, MD simulations still work but are significantly slower (~10–50x). Use --skip-md for quick checks without MD.

Configuration

All thresholds, weights, and verdict boundaries are defined in configs/thresholds.yaml. Key tunables:

  • Intrinsically disordered proteins: Increase physics_verifier.md_rmsd_max_angstrom (e.g., 8.0–10.0) since higher RMSD is expected
  • Membrane proteins: Adjust surface.sasa_polar_ratio_min/max for transmembrane segments

Limitations

ProtQC is a rapid pre-screening tool, not a substitute for comprehensive computational or experimental validation:

  • MD simulation length. The default 10 ns simulation is a rapid pre-screen that catches catastrophic failures (large RMSD drift, complete unfolding). Subtle instabilities — slow conformational changes, partial unfolding events, aggregation-prone intermediates — may require 100–500 ns simulations for reliable detection (Lindorff-Larsen et al. 2011; Ferruz et al. 2022). Treat a ProtQC PASS as "no obvious red flags," not "experimentally validated."

  • Cavity detection. fpocket was designed for identifying druggable surface binding pockets, not for internal void quality control (Le Guilloux et al. 2009). The suspicious cavity flagging (volume > 800 A^3, druggability < 0.4) is a literature-informed heuristic (Schmidtke et al. 2010), not a validated structural defect detector. Combine with packing density metrics or Voronoi-based tools for higher confidence.

  • Risk score weights. The current weights are expert estimates based on published benchmarks (Dauparas et al. 2022; Ferruz et al. 2022) and will be refined through calibration on larger, more diverse protein sets. Different protein families (membrane proteins, IDPs, repeat proteins) may need substantially different weight profiles.

Related Tools

Tool Focus
CHAPERONg Automated GROMACS MD analysis
MolProbity Stereochemistry validation
QMEAN Statistical potential scoring
VoroMQA Voronoi tessellation quality
ProSA Statistical analysis of protein structures
ProteinDJ AI protein design evaluation
BinderFlow Binder design pipeline
OVO De novo protein design ecosystem

Roadmap

v0.2.0 — Benchmark dataset (25 proteins, Garcia/Hermosilla/Chevalier), Colab MCP integration, weight calibration, replica runs

v0.3.0 — Thermal stability prediction, MultiQC-style batch reports, Nextflow/Snakemake templates, REST API

License

MIT

Citation

Güzel, Ö.K. (2026). ProtQC: Physics-based verification of AI-designed protein designs.
github.com/korayguzel/protqc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protqc-0.1.1.tar.gz (68.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

protqc-0.1.1-py3-none-any.whl (55.9 kB view details)

Uploaded Python 3

File details

Details for the file protqc-0.1.1.tar.gz.

File metadata

  • Download URL: protqc-0.1.1.tar.gz
  • Upload date:
  • Size: 68.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for protqc-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e326f163c1acbc6de01df64d77d6397dee9576cf88df85a8b8bfd727d7d2abdc
MD5 e8cbb328147729ab99096912897b2abf
BLAKE2b-256 fb6c5274c3962152b85a7b3b37a3e5351c7f5a071bab0b16c1d7d83df04d73d5

See more details on using hashes here.

File details

Details for the file protqc-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: protqc-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 55.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for protqc-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ced574892d684e7dd89035195a68f3b19e0ea768f7117c33023ff13b75440035
MD5 986ec5833a614ec20b83c3380a68196b
BLAKE2b-256 f9b28c3f4ad6ad520cedd39d4061ab9c73826a932396557f461a0ca5d1608d50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page