A framework for reproducible experiments with pipelines, treatments, and hypotheses.
Project description
Crystallize 🧪✨
⚠️ Pre-Alpha Notice
This project is in an early experimental phase. Breaking changes may occur at any time. Use at your own risk.
Rigorous, reproducible, and clear data science experiments.
Crystallize is an elegant, lightweight Python framework designed to help data scientists, researchers, and machine learning practitioners turn hypotheses into crystal-clear, reproducible experiments.
Why Crystallize?
- Clarity from Complexity: Easily structure your experiments, making it straightforward to follow best scientific practices.
- Repeatability: Built-in support for reproducible results through immutable contexts, lockfiles, and robust pipeline management.
- Statistical Rigor: Hypothesis-driven experiments with integrated statistical verification.
Core Concepts
Crystallize revolves around several key abstractions:
- DataSource: Flexible data fetching and generation.
- Pipeline & PipelineSteps: Deterministic data transformations. Steps may be
synchronous or
asyncfunctions and are awaited automatically. - Hypothesis & Treatments: Quantifiable assertions and experimental variations.
- Statistical Tests: Built-in support for rigorous validation of experiment results.
- Optimizer: Iterative search over treatments using an ask/tell loop.
Getting Started
Installation
Crystallize uses pixi for managing dependencies and environments:
pixi install crystallize-ml
Quick Example
from crystallize import (
DataSource,
Hypothesis,
Pipeline,
Treatment,
Experiment,
SeedPlugin,
ParallelExecution,
)
# Example setup (simple)
pipeline = Pipeline([...])
datasource = DataSource(...)
t_test = WelchTTest()
@hypothesis(verifier=t_test, metrics="accuracy")
def rank_by_p(result):
return result["p_value"]
hypothesis = rank_by_p()
treatment = Treatment(name="experiment_variant", apply_fn=lambda ctx: ctx.update({"learning_rate": 0.001}))
experiment = Experiment(
datasource=datasource,
pipeline=pipeline,
plugins=[SeedPlugin(seed=42), ParallelExecution(max_workers=4)],
)
experiment.validate()
result = experiment.run(
treatments=[treatment],
hypotheses=[hypothesis],
replicates=3,
)
print(result.metrics)
print(result.hypothesis_result)
result.print_tree()
Command Line Interface
Crystallize ships with an interactive CLI for discovering and executing experiments or experiment graphs.
# Discover and run a single experiment
crystallize run experiment
# Discover and run a graph from a specific directory
crystallize run graph --path ./my_project/experiments
# Preview actions without executing
crystallize run graph --dry-run
Project Structure
crystallize/
├── datasources/
├── experiments/
├── pipelines/
├── plugins/
└── utils/
Key classes and decorators are re-exported in :mod:crystallize for concise imports:
from crystallize import Experiment, Pipeline, ArtifactPlugin
This layout keeps implementation details organized while exposing a clean, flat public API.
Roadmap
- Advanced features: Adaptive experimentation, intelligent meta-learning
- Collaboration: Experiment sharing, templates, and community contributions
Contributing
Contributions are very welcome! Please see CONTRIBUTING.md for guidelines.
Use code2prompt to generate LLM-powered docs:
code2prompt crystallize --exclude="*.lock" --exclude="**/docs/src/content/docs/reference/*" --exclude="**package-lock.json" --exclude="**CHANGELOG.md"
License
Crystallize is licensed under the Apache 2.0 License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crystallize_ml-0.20.0.tar.gz.
File metadata
- Download URL: crystallize_ml-0.20.0.tar.gz
- Upload date:
- Size: 60.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2cf7b452a17ef213f60277b2112ec7fdbd1b1557384ad14250c97d21a69a836
|
|
| MD5 |
edae52ec78bdfbe9bbd70e94b1a05e95
|
|
| BLAKE2b-256 |
ed0c059694664a5af16e4a41f3d7413f5f67e3ce61232c2b7734c0ef2ff6abee
|
Provenance
The following attestation bundles were made for crystallize_ml-0.20.0.tar.gz:
Publisher:
publish_pypi.yml on brysontang/crystallize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crystallize_ml-0.20.0.tar.gz -
Subject digest:
a2cf7b452a17ef213f60277b2112ec7fdbd1b1557384ad14250c97d21a69a836 - Sigstore transparency entry: 307679832
- Sigstore integration time:
-
Permalink:
brysontang/crystallize@a51d1309b8b6d1f4a7bae00e795bb1e55fc61ed0 -
Branch / Tag:
refs/tags/crystallize-ml@v0.20.0 - Owner: https://github.com/brysontang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@a51d1309b8b6d1f4a7bae00e795bb1e55fc61ed0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file crystallize_ml-0.20.0-py3-none-any.whl.
File metadata
- Download URL: crystallize_ml-0.20.0-py3-none-any.whl
- Upload date:
- Size: 50.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d92514d6b06b1b6f84bb9a9ffaa6d2ba2612ed1a940c199ac27fe6d074f1756
|
|
| MD5 |
5447e005d2c144e0ac9a2e8006483373
|
|
| BLAKE2b-256 |
1788e7ec5b1aefe976afa9db4fd47c517b3e283531e699a211d077509e0f9938
|
Provenance
The following attestation bundles were made for crystallize_ml-0.20.0-py3-none-any.whl:
Publisher:
publish_pypi.yml on brysontang/crystallize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crystallize_ml-0.20.0-py3-none-any.whl -
Subject digest:
2d92514d6b06b1b6f84bb9a9ffaa6d2ba2612ed1a940c199ac27fe6d074f1756 - Sigstore transparency entry: 307679852
- Sigstore integration time:
-
Permalink:
brysontang/crystallize@a51d1309b8b6d1f4a7bae00e795bb1e55fc61ed0 -
Branch / Tag:
refs/tags/crystallize-ml@v0.20.0 - Owner: https://github.com/brysontang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@a51d1309b8b6d1f4a7bae00e795bb1e55fc61ed0 -
Trigger Event:
release
-
Statement type: