A framework for reproducible experiments with pipelines, treatments, and hypotheses.
Project description
Crystallize 🧪✨
⚠️ Pre-Alpha Notice
This project is in an early experimental phase. Breaking changes may occur at any time. Use at your own risk.
Rigorous, reproducible, and clear data science experiments.
Crystallize is an elegant, lightweight Python framework designed to help data scientists, researchers, and machine learning practitioners turn hypotheses into crystal-clear, reproducible experiments.
Why Crystallize?
- Clarity from Complexity: Easily structure your experiments, making it straightforward to follow best scientific practices.
- Repeatability: Built-in support for reproducible results through immutable contexts, lockfiles, and robust pipeline management.
- Statistical Rigor: Hypothesis-driven experiments with integrated statistical verification.
Core Concepts
Crystallize revolves around several key abstractions:
- DataSource: Flexible data fetching and generation.
- Pipeline & PipelineSteps: Deterministic data transformations. Steps may be
synchronous or
asyncfunctions and are awaited automatically. - Hypothesis & Treatments: Quantifiable assertions and experimental variations.
- Statistical Tests: Built-in support for rigorous validation of experiment results.
- Optimizer: Iterative search over treatments using an ask/tell loop.
Getting Started
Crystallize is a powerful framework that can be used in two primary ways: via the interactive CLI for managing file-based experiments, or as a Python library for full programmatic control.
Installation
Install the framework and its CLI using pixi:
pixi install crystallize-ml
Option 1: The Interactive CLI (Recommended Workflow)
This is the fastest way to create, manage, and run a suite of experiments.
Launch the interactive terminal UI:
crystallize
Scaffold a new experiment:
Inside the UI, press the n key to open the "Create New Experiment" screen. Fill out the details to generate a new experiment folder under experiments/.
Run your experiment:
The UI will automatically discover your new experiment. Highlight it in the list and press Enter to run it.
Option 2: The Python Library (Programmatic Workflow)
Use the library directly in your Python scripts for advanced use cases and integrations.
from crystallize import (
Experiment,
Pipeline,
Treatment,
Hypothesis,
SeedPlugin,
ParallelExecution,
)
# Define your datasource, pipeline, treatments, etc.
pipeline = Pipeline([...])
datasource = DataSource(...)
treatment = Treatment(...)
hypothesis = Hypothesis(...)
# Build and run the experiment programmatically
experiment = Experiment(
datasource=datasource,
pipeline=pipeline,
plugins=[SeedPlugin(seed=42), ParallelExecution(max_workers=4)],
)
result = experiment.run(
treatments=[treatment],
hypotheses=[hypothesis],
replicates=10,
)
print(result.metrics)
Command Line Interface
The crystallize command opens a terminal UI for browsing and executing experiments. Highlight an experiment or graph to view its details and press Enter to run it. The details panel includes a live config editor so you can adjust values directly in config.yaml.
Experiments can define a cli section in config.yaml to control grouping and style:
cli:
group: 'Data Preprocessing'
priority: 1
icon: '📊'
color: '#85C1E9'
hidden: false
You can also run experiments without the UI:
python -m experiments.<experiment_name>.main
Project Structure
crystallize/
├── datasources/
├── experiments/
├── pipelines/
├── plugins/
└── utils/
Key classes and decorators are re-exported in :mod:crystallize for concise imports:
from crystallize import Experiment, Pipeline, ArtifactPlugin
This layout keeps implementation details organized while exposing a clean, flat public API.
Roadmap
- Advanced features: Adaptive experimentation, intelligent meta-learning
- Collaboration: Experiment sharing, templates, and community contributions
Contributing
Contributions are very welcome! Please see CONTRIBUTING.md for guidelines.
Use code2prompt to generate LLM-powered docs:
code2prompt crystallize --exclude="*.lock" --exclude="**/docs/src/content/docs/reference/*" --exclude="**package-lock.json" --exclude="**CHANGELOG.md"
License
Crystallize is licensed under the Apache 2.0 License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crystallize_ml-0.24.10.tar.gz.
File metadata
- Download URL: crystallize_ml-0.24.10.tar.gz
- Upload date:
- Size: 88.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff54bf91ea517252e6a9983ce98116d9d6348d58abcb5b58066e924c50bf2466
|
|
| MD5 |
b297996716308cb7237f6768715ba259
|
|
| BLAKE2b-256 |
a8a73daeee55d003ffe81c6da3a68d32f3dfd7e02df7faea04aec9b2e3979d72
|
Provenance
The following attestation bundles were made for crystallize_ml-0.24.10.tar.gz:
Publisher:
publish_pypi.yml on brysontang/crystallize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crystallize_ml-0.24.10.tar.gz -
Subject digest:
ff54bf91ea517252e6a9983ce98116d9d6348d58abcb5b58066e924c50bf2466 - Sigstore transparency entry: 331441845
- Sigstore integration time:
-
Permalink:
brysontang/crystallize@0236983726fda373bcbae0b662c99cbcf2b45c2d -
Branch / Tag:
refs/tags/crystallize-ml@v0.24.10 - Owner: https://github.com/brysontang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@0236983726fda373bcbae0b662c99cbcf2b45c2d -
Trigger Event:
release
-
Statement type:
File details
Details for the file crystallize_ml-0.24.10-py3-none-any.whl.
File metadata
- Download URL: crystallize_ml-0.24.10-py3-none-any.whl
- Upload date:
- Size: 78.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0f03cda41a199ebd561bf0c5309d106147847eab2de32f7494c53148d2edcb5
|
|
| MD5 |
c713b1b2285ea01cfb2b570f71f4815a
|
|
| BLAKE2b-256 |
c60d23e150f8b30927e123d7ba647def73a2ac15b7825ec4a58e70870ccdb571
|
Provenance
The following attestation bundles were made for crystallize_ml-0.24.10-py3-none-any.whl:
Publisher:
publish_pypi.yml on brysontang/crystallize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crystallize_ml-0.24.10-py3-none-any.whl -
Subject digest:
c0f03cda41a199ebd561bf0c5309d106147847eab2de32f7494c53148d2edcb5 - Sigstore transparency entry: 331441880
- Sigstore integration time:
-
Permalink:
brysontang/crystallize@0236983726fda373bcbae0b662c99cbcf2b45c2d -
Branch / Tag:
refs/tags/crystallize-ml@v0.24.10 - Owner: https://github.com/brysontang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@0236983726fda373bcbae0b662c99cbcf2b45c2d -
Trigger Event:
release
-
Statement type: