Physics-based verification of AI-designed protein structures

These details have not been verified by PyPI

Project description

ProtQC

Physics-based verification of AI-designed protein structures

Catches structural hallucinations before wet-lab

Why ProtQC?

AI protein design tools (AlphaFold, RFdiffusion, ProteinMPNN, BoltzGen) routinely produce structures with high confidence scores (pLDDT > 90) that still fail experimentally. A protein can look perfect by pLDDT yet harbor internal voids, unstable hydrogen bond networks, or thermodynamic instabilities that only surface in solution.

ProtQC combines six physics-based metrics into a composite risk score, catching high-pLDDT hallucinations that no single metric detects on its own.

Quick Start

protqc analyze protein.pdb

The 6 Metrics

#	Metric	Source	What It Catches
1	pLDDT	Structure prediction	Low confidence regions
2	MD RMSD	OpenMM	Backbone instability under simulation
3	Cavity Volume	fpocket	Internal voids and packing defects
4	H-bond Persistence	MDTraj	Weak hydrogen bond networks
5	SS Preservation	MDTraj DSSP	Secondary structure loss during MD
6	SASA Polar Ratio	FreeSASA	Abnormal surface accessibility

Each metric produces a normalized 0–1 sub-score. The composite risk score is a weighted sum, mapped to a verdict:

PASS (risk < 0.30) — Design is physically plausible
WARNING (0.30 ≤ risk < 0.50) — Proceed with caution; review flagged metrics
FAIL (risk ≥ 0.50) — Design has significant structural issues

Risk Scoring Weights

risk_weights:
  plddt: 0.12
  md_rmsd: 0.29
  cavity: 0.12
  hbond_persistence: 0.24
  ss_preservation: 0.18
  sasa_ratio: 0.05

Validated Results

Protein	Verdict	Risk Score
Ubiquitin (1UBQ)	PASS	0.257
GFP (1EMA)	PASS	0.281
Alpha-synuclein (1XQ8)	FAIL	0.555

Performance

Protein	MD Duration	Wall Time	GPU
Ubiquitin (76 aa)	10 ns	~23 min	RTX 4070
GFP (238 aa)	10 ns	~49 min	RTX 4070

Usage

ProtQC provides three usage modes:

CLI — Single Protein Analysis

# Analyze a PDB file
protqc analyze protein.pdb

# Enter a PDB ID — auto-downloads from RCSB
protqc analyze 1UBQ

# Skip MD simulation for quick structural checks
protqc analyze protein.pdb --skip-md

# Set MD simulation length
protqc analyze protein.pdb --md-duration 10

# Use pre-computed MD trajectory
protqc analyze protein.pdb --trajectory md_output.csv

# Generate FastQC-style HTML report
protqc analyze protein.pdb --html report.html

# JSON output
protqc analyze protein.pdb --format json

Interactive Mode

# Launch interactive prompt — guides you through analysis
protqc

AI Chat Assistant

# Start AI-powered chat for interpreting results
protqc chat

Chat supports 8 providers via LiteLLM: OpenAI, Anthropic, Google, DeepSeek, OpenRouter, Moonshot, MiniMax, Zhipu.

Installation

PyPI + conda (recommended)

conda create -n protqc python=3.11
conda activate protqc

# OpenMM and fpocket from conda-forge (not available on PyPI)
conda install -c conda-forge openmm fpocket

# Install ProtQC with all dependencies
pip install protqc

# Analyze a protein
protqc analyze protein.pdb

Source install (development)

git clone https://github.com/korayguzel/protqc.git
cd protqc
conda create -n protqc python=3.11
conda activate protqc
conda install -c conda-forge openmm fpocket
pip install -e '.[all,dev]'

Docker (all platforms)

Docker bundles all dependencies (OpenMM, CUDA, fpocket, FreeSASA, MDTraj):

# Build the image
docker build -t protqc .

# Analyze a protein (GPU-accelerated)
docker run --gpus all -v $(pwd)/data:/app/data protqc analyze data/benchmark/ubiquitin.pdb

# Run with MD simulation
docker run --gpus all -v $(pwd)/data:/app/data protqc analyze data/benchmark/ubiquitin.pdb --md-duration 10

# CPU-only (MD will be slow)
docker run -v $(pwd)/data:/app/data -e CUDA_VISIBLE_DEVICES="" protqc analyze protein.pdb --skip-md

Docker Compose:

# GPU-accelerated
docker compose run protqc analyze data/benchmark/ubiquitin.pdb

# CPU-only variant
docker compose run protqc-cpu analyze data/benchmark/ubiquitin.pdb --skip-md

Note: GPU support requires the NVIDIA Container Toolkit. Without a GPU, MD simulations still work but are significantly slower (~10–50x). Use --skip-md for quick checks without MD.

Configuration

All thresholds, weights, and verdict boundaries are defined in configs/thresholds.yaml. Key tunables:

Intrinsically disordered proteins: Increase physics_verifier.md_rmsd_max_angstrom (e.g., 8.0–10.0) since higher RMSD is expected
Membrane proteins: Adjust surface.sasa_polar_ratio_min/max for transmembrane segments

Limitations

ProtQC is a rapid pre-screening tool, not a substitute for comprehensive computational or experimental validation:

MD simulation length. The default 10 ns simulation is a rapid pre-screen that catches catastrophic failures (large RMSD drift, complete unfolding). Subtle instabilities — slow conformational changes, partial unfolding events, aggregation-prone intermediates — may require 100–500 ns simulations for reliable detection (Lindorff-Larsen et al. 2011; Ferruz et al. 2022). Treat a ProtQC PASS as "no obvious red flags," not "experimentally validated."
Cavity detection. fpocket was designed for identifying druggable surface binding pockets, not for internal void quality control (Le Guilloux et al. 2009). The suspicious cavity flagging (volume > 800 A^3, druggability < 0.4) is a literature-informed heuristic (Schmidtke et al. 2010), not a validated structural defect detector. Combine with packing density metrics or Voronoi-based tools for higher confidence.
Risk score weights. The current weights are expert estimates based on published benchmarks (Dauparas et al. 2022; Ferruz et al. 2022) and will be refined through calibration on larger, more diverse protein sets. Different protein families (membrane proteins, IDPs, repeat proteins) may need substantially different weight profiles.

Related Tools

Tool	Focus
CHAPERONg	Automated GROMACS MD analysis
MolProbity	Stereochemistry validation
QMEAN	Statistical potential scoring
VoroMQA	Voronoi tessellation quality
ProSA	Statistical analysis of protein structures
ProteinDJ	AI protein design evaluation
BinderFlow	Binder design pipeline
OVO	De novo protein design ecosystem

Roadmap

v0.2.0 — Benchmark dataset (25 proteins, Garcia/Hermosilla/Chevalier), Colab MCP integration, weight calibration, replica runs

v0.3.0 — Thermal stability prediction, MultiQC-style batch reports, Nextflow/Snakemake templates, REST API

License

MIT

Citation

Güzel, Ö.K. (2026). ProtQC: Physics-based verification of AI-designed protein designs.
github.com/korayguzel/protqc

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Mar 21, 2026

0.1.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protqc-0.1.1.tar.gz (68.1 kB view details)

Uploaded Mar 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

protqc-0.1.1-py3-none-any.whl (55.9 kB view details)

Uploaded Mar 21, 2026 Python 3

File details

Details for the file protqc-0.1.1.tar.gz.

File metadata

Download URL: protqc-0.1.1.tar.gz
Upload date: Mar 21, 2026
Size: 68.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for protqc-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e326f163c1acbc6de01df64d77d6397dee9576cf88df85a8b8bfd727d7d2abdc`
MD5	`e8cbb328147729ab99096912897b2abf`
BLAKE2b-256	`fb6c5274c3962152b85a7b3b37a3e5351c7f5a071bab0b16c1d7d83df04d73d5`

See more details on using hashes here.

File details

Details for the file protqc-0.1.1-py3-none-any.whl.

File metadata

Download URL: protqc-0.1.1-py3-none-any.whl
Upload date: Mar 21, 2026
Size: 55.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for protqc-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ced574892d684e7dd89035195a68f3b19e0ea768f7117c33023ff13b75440035`
MD5	`986ec5833a614ec20b83c3380a68196b`
BLAKE2b-256	`f9b28c3f4ad6ad520cedd39d4061ab9c73826a932396557f461a0ca5d1608d50`

See more details on using hashes here.

protqc 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ProtQC

Why ProtQC?

Quick Start

The 6 Metrics

Risk Scoring Weights

Validated Results

Performance

Usage

CLI — Single Protein Analysis

Interactive Mode

AI Chat Assistant

Installation

PyPI + conda (recommended)

Source install (development)

Docker (all platforms)

Configuration

Limitations

Related Tools

Roadmap

License

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes