Prevent statistically invalid analyses from being shipped
Project description
stat-guard
stat-guard is a production-grade statistical assumption validation library for experiments such as A/B tests and controlled studies.
It acts as a guardrail, validating data integrity and statistical assumptions before any analysis is performed.
If stat-guard fails, the experiment must not be analyzed.
🚦 Why stat-guard exists
Most statistical failures do not come from incorrect formulas.
They come from broken data and violated assumptions:
- Duplicate users counted multiple times
- Users appearing in both control and treatment
- Samples too small to be meaningful
- Imbalanced or biased groups
- Metrics with zero variance
- Silent assumption violations in production pipelines
These issues often surface after results are shipped.
stat-guard prevents that.
🧠 What stat-guard does
- Validates unit integrity (missing IDs, duplicates, leakage)
- Checks minimum sample size
- Detects group imbalance
- Measures covariate balance (SMD)
- Flags zero-variance metrics
- Diagnoses distribution issues (skewness, normality)
- Separates errors (blocking) from warnings (diagnostic)
- Produces deterministic, machine-readable reports
Designed for:
- CI/CD pipelines
- Experiment gating systems
- Production data workflows
🚫 What stat-guard does NOT do
stat-guard is not a statistics engine.
It deliberately does not:
- ❌ Run hypothesis tests
- ❌ Modify or auto-fix data
- ❌ Apply transformations
- ❌ Guess intent or apply heuristics
This keeps behavior explicit, transparent, and reproducible.
🧱 Core Philosophy
- Explicit over implicit
- No automatic corrections
- Errors invalidate experiments; warnings do not
- Deterministic, reproducible behavior
- Production-first, not notebook-first
- Simple, readable, maintainable code
📦 Installation
From GitHub (current)
stat-guard is currently distributed via GitHub:
pip install git+https://github.com/aaryansolankii/stat-guard.git
## 🚀 Quick example
```python
import pandas as pd
from stat_guard import validate
data = pd.DataFrame({
"metric": [10, 12, 11, 13, 15, 14],
"group": ["control", "control", "control", "treatment", "treatment", "treatment"]
})
report = validate(
data,
target_col="metric",
group_col="group"
)
if not report.is_valid:
raise RuntimeError(report)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stat_guard-0.2.0.tar.gz.
File metadata
- Download URL: stat_guard-0.2.0.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95e33b6ca2569e745ccbd989acb0802f1eaf1270568c52847bb0e2a5028f9ee7
|
|
| MD5 |
d46d31cdb3ee2484f47ffa1e1f10a2ae
|
|
| BLAKE2b-256 |
286e2eab1e91a8bc4915b49fc37cd7f6a1a9b9277f58d3a5796e6ceece30d3c5
|
File details
Details for the file stat_guard-0.2.0-py3-none-any.whl.
File metadata
- Download URL: stat_guard-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c8eb5f617bb9da6dc35950dc814a5c5486e0c4692dde6c470991bab4dba4772
|
|
| MD5 |
025f7a8cc469d9f31e9560d9142bde5a
|
|
| BLAKE2b-256 |
0792dfd7be52f66619fac3985ad71b606ed8477bda21c3f80fc801f873265e7d
|