Skip to main content

A framework for reproducible experiments with pipelines, treatments, and hypotheses.

Project description

Crystallize 🧪✨

Test Lint PyPI Version License Codecov

⚠️ Pre-Alpha Notice
This project is in an early experimental phase. Breaking changes may occur at any time. Use at your own risk.


Rigorous, reproducible, and clear data science experiments.

Crystallize is an elegant, lightweight Python framework designed to help data scientists, researchers, and machine learning practitioners turn hypotheses into crystal-clear, reproducible experiments.


Why Crystallize?

  • Clarity from Complexity: Easily structure your experiments, making it straightforward to follow best scientific practices.
  • Repeatability: Built-in support for reproducible results through immutable contexts, lockfiles, and robust pipeline management.
  • Statistical Rigor: Hypothesis-driven experiments with integrated statistical verification.

Core Concepts

Crystallize revolves around several key abstractions:

  • DataSource: Flexible data fetching and generation.
  • Pipeline & PipelineSteps: Deterministic data transformations. Steps may be synchronous or async functions and are awaited automatically.
  • Hypothesis & Treatments: Quantifiable assertions and experimental variations.
  • Statistical Tests: Built-in support for rigorous validation of experiment results.
  • Optimizer: Iterative search over treatments using an ask/tell loop.

Getting Started

Installation

Crystallize uses pixi for managing dependencies and environments:

pixi install crystallize-ml

Quick Example

from crystallize import (
    DataSource,
    Hypothesis,
    Pipeline,
    Treatment,
    Experiment,
    SeedPlugin,
    ParallelExecution,
)

# Example setup (simple)
pipeline = Pipeline([...])
datasource = DataSource(...)
t_test = WelchTTest()

@hypothesis(verifier=t_test, metrics="accuracy")
def rank_by_p(result):
    return result["p_value"]

hypothesis = rank_by_p()

treatment = Treatment(name="experiment_variant", apply_fn=lambda ctx: ctx.update({"learning_rate": 0.001}))

experiment = Experiment(
    datasource=datasource,
    pipeline=pipeline,
    plugins=[SeedPlugin(seed=42), ParallelExecution(max_workers=4)],
)
experiment.validate()
result = experiment.run(
    treatments=[treatment],
    hypotheses=[hypothesis],
    replicates=3,
)
print(result.metrics)
print(result.hypothesis_result)
result.print_tree()

Command Line Interface

Crystallize ships with an interactive CLI for discovering and executing experiments or experiment graphs.

# Discover and run a single experiment
crystallize run experiment

# Discover and run a graph from a specific directory
crystallize run graph --path ./my_project/experiments

# Preview actions without executing
crystallize run graph --dry-run

Project Structure

crystallize/
├── datasources/
├── experiments/
├── pipelines/
├── plugins/
└── utils/

Key classes and decorators are re-exported in :mod:crystallize for concise imports:

from crystallize import Experiment, Pipeline, ArtifactPlugin

This layout keeps implementation details organized while exposing a clean, flat public API.


Roadmap

  • Advanced features: Adaptive experimentation, intelligent meta-learning
  • Collaboration: Experiment sharing, templates, and community contributions

Contributing

Contributions are very welcome! Please see CONTRIBUTING.md for guidelines.

Use code2prompt to generate LLM-powered docs:

code2prompt crystallize --exclude="*.lock" --exclude="**/docs/src/content/docs/reference/*" --exclude="**package-lock.json" --exclude="**CHANGELOG.md"

License

Crystallize is licensed under the Apache 2.0 License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crystallize_ml-0.20.0.tar.gz (60.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crystallize_ml-0.20.0-py3-none-any.whl (50.9 kB view details)

Uploaded Python 3

File details

Details for the file crystallize_ml-0.20.0.tar.gz.

File metadata

  • Download URL: crystallize_ml-0.20.0.tar.gz
  • Upload date:
  • Size: 60.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for crystallize_ml-0.20.0.tar.gz
Algorithm Hash digest
SHA256 a2cf7b452a17ef213f60277b2112ec7fdbd1b1557384ad14250c97d21a69a836
MD5 edae52ec78bdfbe9bbd70e94b1a05e95
BLAKE2b-256 ed0c059694664a5af16e4a41f3d7413f5f67e3ce61232c2b7734c0ef2ff6abee

See more details on using hashes here.

Provenance

The following attestation bundles were made for crystallize_ml-0.20.0.tar.gz:

Publisher: publish_pypi.yml on brysontang/crystallize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file crystallize_ml-0.20.0-py3-none-any.whl.

File metadata

File hashes

Hashes for crystallize_ml-0.20.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d92514d6b06b1b6f84bb9a9ffaa6d2ba2612ed1a940c199ac27fe6d074f1756
MD5 5447e005d2c144e0ac9a2e8006483373
BLAKE2b-256 1788e7ec5b1aefe976afa9db4fd47c517b3e283531e699a211d077509e0f9938

See more details on using hashes here.

Provenance

The following attestation bundles were made for crystallize_ml-0.20.0-py3-none-any.whl:

Publisher: publish_pypi.yml on brysontang/crystallize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page