Skip to main content

Statistical analysis engine for ML experiments

Project description

ml-experiment-stats

Statistical analysis engine for ML experiments with multi-seed evaluation.

Install

pip install ml-experiment-stats              # core: numpy, scipy, pyyaml
pip install ml-experiment-stats[parquet]     # + pyarrow
pip install ml-experiment-stats[plots]       # + matplotlib
pip install ml-experiment-stats[bayesian]    # + baycomp
pip install ml-experiment-stats[all]         # everything

Usage

SDK

from ml_experiment_stats import RunResult, ResultsCollector, ExperimentConfig
from ml_experiment_stats.statistics import run_statistical_analysis
from ml_experiment_stats.report import save_report

collector = ResultsCollector("results/")
collector.add(RunResult(seed=42, method="baseline", metrics={"mse": 0.12}))
collector.add(RunResult(seed=42, method="proposed", metrics={"mse": 0.08}))
collector.save()
save_report("results/")

CLI

mlstats summary --results-dir results/
mlstats report --results-dir results/
mlstats diff results_new/ results_baseline/
mlstats check --config experiment.yaml --results-dir results/

Orchestrator

from ml_experiment_stats import ExperimentConfig, RunResult, set_seed
from ml_experiment_stats.cli_run import run_with

def run_single(config: ExperimentConfig, seed: int) -> list[RunResult]:
    set_seed(seed)
    # your experiment logic here
    return [RunResult(seed=seed, method="my_method", metrics={"acc": 0.95})]

run_with(run_single)

Statistical Methods

  • Pairwise: Wilcoxon signed-rank, paired t-test, auto (Shapiro-Wilk selection)
  • Omnibus: Friedman test, Nemenyi post-hoc with Critical Difference diagrams
  • Bayesian: Signed-rank test with ROPE (Region of Practical Equivalence)
  • Effect sizes: Cliff's delta (non-parametric), Cohen's d (parametric)
  • Corrections: Holm-Bonferroni, Bonferroni
  • Confidence intervals: BCa bootstrap
  • Power analysis: Post-hoc power with recommended sample size
  • Multi-dataset: Cross-dataset Friedman analysis (Demsar 2006)

Output

make run / run_with() produces:

File Format For
summary.json JSON Per-method mean/std/min/max
statistics.json JSON All pairwise tests, Friedman, Bayesian, power
report.json JSON Structured report for LLM agents
report.md Markdown Human-readable report
figures/ PNG/PDF Bar plots, per-seed, heatmaps, CD diagrams

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_experiment_stats-0.1.0.tar.gz (89.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml_experiment_stats-0.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file ml_experiment_stats-0.1.0.tar.gz.

File metadata

  • Download URL: ml_experiment_stats-0.1.0.tar.gz
  • Upload date:
  • Size: 89.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ml_experiment_stats-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2cd6e6fe829fb030459c12f5cf8c0ffc316e669fedf0fb2e326f791be9e83e62
MD5 02f9ae8d84782af79c40ee68ad87d60a
BLAKE2b-256 571af0910d807b09f0f2fd89fcf7aa5b5bb8b6d2a5d756890331353b4c11d29e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ml_experiment_stats-0.1.0.tar.gz:

Publisher: release.yml on kint-pro/ml-experiment-stats

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ml_experiment_stats-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ml_experiment_stats-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d56543e124abb0dfc9e62d749168c0291c9fd236c88dcab39635efb1d279b27
MD5 38d0bad284d355c2e11b8adc1500c1d9
BLAKE2b-256 7535471d542afc169bb83fee2967b2c35f56a91ba6b486948cd8e9c8e105ede5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ml_experiment_stats-0.1.0-py3-none-any.whl:

Publisher: release.yml on kint-pro/ml-experiment-stats

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page