A framework for reproducible experiments with pipelines, treatments, and hypotheses.
Project description
Crystallize 🧪✨
⚠️ Alpha status: Crystallize ≥0.25.1 is in active development. Interfaces are stable enough for daily use, but minor breaking changes may occur between pre-releases. Install the latest build with:
pip install --upgrade --pre crystallize-ml
Crystallize is a lightweight Python framework for running reproducible data-science experiments. It couples immutable execution contexts, deterministic pipeline steps, pluggable execution backends, and first-class statistical verification. Use it either as a Python library or through a fully interactive terminal UI that discovers experiments from declarative config.yaml files.
Why Crystallize?
- Reproducible by default – Every run executes inside a
FrozenContextthat records metrics, artifacts, and provenance. The default plugins automatically seed Python’s RNG, persist artifacts, and stream structured logs. - Deterministic pipelines – Build pipelines from
@pipeline_stepfactories. Parameter injection pulls values directly from the context, and optional caching skips work when code, parameters, and inputs are unchanged. - Treatments & hypotheses – Express experimental variations with
treatment()helpers and verify outcomes with@verifier+@hypothesispairs. - DAG orchestration – Stitch experiments together with
ExperimentGraphand reuse artifacts produced by upstream runs. - Batteries-included CLI – Launch
crystallizefor a Textual-powered TUI that scaffolds experiments, runs them live, toggles caching, previews summaries, and opens source files in$EDITOR.
Installation
Crystallize supports Python 3.10+. Install from PyPI:
pip install --upgrade --pre crystallize-ml
Optional extras are published under crystallize-extras and can be pulled at once with:
pip install --upgrade --pre "crystallize-extras[all]"
For local development:
git clone https://github.com/brysontang/crystallize.git
cd crystallize
pip install -e .
Quick Start (Library)
from crystallize import (
Experiment,
Pipeline,
ParallelExecution,
FrozenContext,
data_source,
pipeline_step,
treatment,
hypothesis,
verifier,
)
from scipy.stats import ttest_ind
@data_source
def source(ctx: FrozenContext) -> list[int]:
return [0, 0, 0]
@pipeline_step()
def add_delta(data: list[int], ctx: FrozenContext, *, delta: float = 0.0) -> list[float]:
return [x + delta for x in data]
@pipeline_step()
def record_metric(data: list[float], ctx: FrozenContext):
return data, {"total": sum(data)}
add_ten = treatment("add_ten", {"delta": 10.0})
@verifier
def welch_t_test(baseline, treatment, alpha: float = 0.05):
stat, p_value = ttest_ind(
treatment["total"], baseline["total"], equal_var=False
)
return {"p_value": p_value, "significant": p_value < alpha}
@hypothesis(verifier=welch_t_test(), metrics="total")
def by_p_value(result: dict[str, float]) -> float:
return result.get("p_value", 1.0)
experiment = (
Experiment.builder("demo")
.datasource(source())
.add_step(add_delta())
.add_step(record_metric())
.plugins([ParallelExecution(max_workers=4)])
.treatments([add_ten()])
.hypotheses([by_p_value])
.replicates(10)
.build()
)
result = experiment.run()
print(result.get_hypothesis("by_p_value").results)
The builder ensures the default ArtifactPlugin, SeedPlugin, and LoggingPlugin are attached. Seeds are derived from the replicate index unless you supply a fixed SeedPlugin(seed=42).
See examples/minimal_experiment/main.py for a full runnable script with logging enabled.
Quick Start (CLI)
- Launch the TUI:
crystallize
- The selection screen discovers every
experiments/**/config.yaml. Key bindings:ncreate a new experiment scaffold (choose files, optional example code, and reuse outputs from other experiments).rrefresh discovery,einspect load errors,qquit.- Highlight an experiment or graph and press
Enterto open the run screen.
- Run screen highlights:
Rtoggles between Run and Cancel.Sjumps to the summary tab;tswitches the log/summary pane between Rich rendering and plain text.ltoggles caching for the selected experiment/step,xenables or disables individual treatments.eopens the highlighted step or experiment in$CRYSTALLIZE_EDITOR,$EDITOR, or$VISUAL.- The summary tab lists metrics, hypotheses, and artifacts (with version information) for both current and historical runs.
- Treatment state (
.state.json) is persisted so the next run remembers which variants were disabled.
To configure experiments, edit the live config.yaml tree in the right pane or open the file in your editor.
Declarative config.yaml
Folder-based experiments mirror the structure used across the examples:
experiments/
└── titanic_survival/
├── config.yaml
├── datasources.py
├── steps.py
├── verifiers.py
└── outputs.py
Key sections inside config.yaml:
| Section | Purpose |
|---|---|
name |
Overrides the experiment identifier (defaults to folder name). |
replicates |
Baseline and treatment replicate count (default 1). |
cli |
Controls grouping, priority, icon, color, and visibility in the CLI discovery screen. |
datasource |
Maps aliases to @data_source factories or to experiment#artifact references for DAG inputs. |
steps |
Ordered list of pipeline factories. Dictionaries allow passing keyword arguments. |
outputs |
Declares named Artifact handles. Loader/writer symbols are resolved from outputs.py. |
treatments |
Context values injected before each run. Nested dicts are merged with the baseline context. |
hypotheses |
Hook into verifier functions defined in verifiers.py. Supports multiple metrics per hypothesis. |
Load a folder or individual config with Experiment.from_yaml(...) or ExperimentGraph.from_yaml(...). The loader hot-reloads datasources.py, steps.py, and friends so you can iterate without restarting the CLI.
Extras
crystallize-extras adds optional integrations:
RayExecution– parallelize replicates on a Ray cluster.initialize_ollama_client/initialize_async_ollama_client– populate the context with reusable Ollama clients.OpenAIChatStepandVLLMStep– opinionated pipeline steps for LLM workloads.
Install via pip install --upgrade --pre "crystallize-extras[ray]" (or ollama, openai, vllm, all).
Learning More
- Documentation: The
/docssite (Astro + Starlight) mirrors the Diátaxis structure—tutorials, how-to guides, explanations, and API reference. Runnpm installandnpm run devinsidedocs/to preview locally. - Examples: Browse
examples/for runnable pipelines covering CSV ingestion, DAG chaining, optimization loops, and YAML-driven workflows. - Tests: Unit tests under
tests/double as usage examples for the CLI, YAML loader, and plugin architecture.
Use code2prompt to generate context prompts for large language models:
code2prompt crystallize \
--exclude="*.lock" \
--exclude="**/docs/src/content/docs/reference/*" \
--exclude="**/package-lock.json" \
--exclude="**/CHANGELOG.md"
Contributing
Contributions, issues, and feature requests are welcome! Please read docs/src/content/docs/contributing.md for environment setup, testing commands (pixi run lint, pixi run test, pixi run cov, pixi run diff-cov), and review guidelines.
License
Crystallize is distributed under the Apache 2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crystallize_ml-0.27.0.tar.gz.
File metadata
- Download URL: crystallize_ml-0.27.0.tar.gz
- Upload date:
- Size: 137.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c16e5634f845a7ddb9b0be0f6ead08a11f48dc34cdef84e9ce56976255d61a1
|
|
| MD5 |
283356799e6254411657657f27e456c7
|
|
| BLAKE2b-256 |
a56f209a248d3ac49e7791f3cb1133c35626d988b73bca8e4f4a9b6f2fef4bd7
|
Provenance
The following attestation bundles were made for crystallize_ml-0.27.0.tar.gz:
Publisher:
publish_pypi.yml on brysontang/crystallize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crystallize_ml-0.27.0.tar.gz -
Subject digest:
1c16e5634f845a7ddb9b0be0f6ead08a11f48dc34cdef84e9ce56976255d61a1 - Sigstore transparency entry: 791757937
- Sigstore integration time:
-
Permalink:
brysontang/crystallize@16907585c92808c88f89a405b71c6d350f5d2e94 -
Branch / Tag:
refs/tags/crystallize-ml@v0.27.0 - Owner: https://github.com/brysontang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@16907585c92808c88f89a405b71c6d350f5d2e94 -
Trigger Event:
release
-
Statement type:
File details
Details for the file crystallize_ml-0.27.0-py3-none-any.whl.
File metadata
- Download URL: crystallize_ml-0.27.0-py3-none-any.whl
- Upload date:
- Size: 100.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a27b7186b896e87bb9b086d1a7d16524e5cda9cb129d00b2fea2c109efd10f1e
|
|
| MD5 |
6b3dc5c743d0cef3932d6b10125538c2
|
|
| BLAKE2b-256 |
14eb0bc84f154538559d1067042e9a0fb764c94c30826c7beba5821a1b9a17ba
|
Provenance
The following attestation bundles were made for crystallize_ml-0.27.0-py3-none-any.whl:
Publisher:
publish_pypi.yml on brysontang/crystallize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crystallize_ml-0.27.0-py3-none-any.whl -
Subject digest:
a27b7186b896e87bb9b086d1a7d16524e5cda9cb129d00b2fea2c109efd10f1e - Sigstore transparency entry: 791757997
- Sigstore integration time:
-
Permalink:
brysontang/crystallize@16907585c92808c88f89a405b71c6d350f5d2e94 -
Branch / Tag:
refs/tags/crystallize-ml@v0.27.0 - Owner: https://github.com/brysontang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@16907585c92808c88f89a405b71c6d350f5d2e94 -
Trigger Event:
release
-
Statement type: