Skip to main content

Sibyl — configuration-driven multi-task deep learning experiment platform (time-series forecasting + gene expression prediction)

Project description

English | 简体中文

Sibyl

License: Apache 2.0 Python 3.10+ PyTorch Hydra

Renamed from DeepTS-Flow-Wheat (v0.x). PyPI: sibyl-ml. Python module: sibyl. CLI: sibyl-train / sibyl-bench / sibyl-genex.

Configuration-driven deep learning experiment platform — 12 time-series forecasting models from 2017–2025 plus a multi-omics gene-expression pipeline, all one Hydra config away.

Sibyl covers two task domains on a single Hydra+PyTorch core: time-series forecasting with a unified, fair-comparison protocol across 12 baselines, and gene-expression prediction from DNA sequence + epigenetic marks via the CrossMark architecture. One Trainer, one config system, one CLI. Drop in a new model file, add a YAML, send a PR — LazyModelDict auto-discovers it. AutoML mode profiles your data, recommends the top-3 models, trains them, and returns an ensemble report. Resume, mixed precision, GPU peak tracking, deterministic seeding, and Optuna sweeps come built in.

Quickstart (30 seconds, CPU-only)

git clone https://github.com/leehom0123/sibyl
cd sibyl
pip install -e ".[dev]"
python main.py experiment=demo

The shipped 12 KB toy dataset (data/demo_sine.npz, regenerable via python scripts/forecast/make_demo_data.py) trains a DLinear for 5 epochs on CPU. Expect outputs/forecast/<timestamp>_demo/ containing training curves, predictions, error analysis, SHAP attributions, and a leaderboard row — all in well under a minute.

Models

Time-series forecasting (12)

Year Model Paper Type
2017 Transformer Vaswani et al. (NeurIPS) Encoder-decoder
2021 Autoformer Wu et al. (NeurIPS) Decomposition + auto-correlation
2022 FEDformer Zhou et al. (ICML) Frequency-domain attention
2023 DLinear Zeng et al. (AAAI) Pure linear
2023 PatchTST Nie et al. (ICLR) Patch + channel-independent
2023 TimesNet Wu et al. (ICLR) 2D periodicity
2024 iTransformer Liu et al. (ICLR) Inverted attention
2024 TimeMixer Wang et al. (ICLR) Multi-scale MLP
2024 TimeXer Wang et al. (ICML) Patch + exogenous
2024 SOFTS Han et al. (NeurIPS) STar aggregation
2025 TimeFilter Hu et al. (ICLR) Filter-based
2025 DUET Qiu et al. (KDD) Dual clustering + Mahalanobis mask

All 12 models are ported from each paper's reference implementation (10 from Time-Series-Library; SOFTS and DUET from their official author repos).

Gene expression / multi-omics (1)

Year Model Description Task
2025 CrossMark Multimodal cross-attention over DNA sequence + epigenetic marks Multi-omics gene expression

Run a benchmark

# Single time-series experiment
python main.py experiment=etth1 model=patchtst

# Full forecast benchmark (10 datasets x 12 models)
python scripts/forecast/run_benchmark.py --epochs 50

# Multi-seed (mean ± std reporting)
python scripts/forecast/run_benchmark.py --seeds 42 2024 2025 --datasets etth1 --models dlinear

# Hyperparameter search (Optuna, resumable from SQLite)
python main.py -m hparams_search=dam_optuna

# AutoML: profile data → recommend top-3 → train → ensemble report
python main.py mode=auto

Resume an interrupted run by re-running the same command — COMPLETED markers and saved random states keep the resume bit-deterministic.

Gene expression (multi-omics)

# Smoke run on the spike dataset (1 epoch, single mark)
python main.py launcher=gene_expr experiment=spike model=crossmark \
    model.active_marks='[ATAC]'

# 32-combination interpretability sweep (5-fold CV)
python scripts/gene_expr/run_sweep.py --n_folds 5 --epochs 200 --resume

# Post-sweep Shapley + interaction analysis
python scripts/gene_expr/run_analysis.py

The gene-expression pipeline reuses the same Trainer, callbacks, and config groups as the forecasting pipeline; only the Task, Model, and DataProvider differ. See docs/gene_expr.md for the data layout and how to extend to a new organ or species without code changes.

Adding a new model (5 steps)

  1. Create sibyl/models/<name>.py with class Model(BaseModel).
  2. Create configs/model/<name>.yaml.
  3. Run python main.py experiment=etth1 model=<name>.
  4. Add tests/unit/test_<name>.py.
  5. Send the PR — LazyModelDict auto-discovers it; ModelRecommender picks up your recommender: rules.

See CONTRIBUTING.md for the full path, including code style and PR conventions.

Architecture (one-liner)

main.py → Hydra composes cfg.launcher → Launcher builds Task + Model + DataProvider → shared Trainer.fit() / .test() → callbacks (visualisation, leaderboard, SHAP, attention export, …) write everything under outputs/<pipeline>/. Every component is auto-discovered from its directory; adding one is a YAML and a Python file — zero registration.

Reproducibility

  • Unified protocol across all 12 forecasting baselines: 50 epochs, patience 10, cosine annealing, Adam lr=1e-4, batch_size=32, seed=2021. One protocol, one set of architecture hyperparameters per model — performance differences come from data, not from per-dataset tuning.
  • Multi-seed benchmarking via --seeds 42 2024 2025 (auto-suffixed run dirs and leaderboard rows).
  • Deterministic resume: Python / NumPy / PyTorch / CUDA random states all serialised into checkpoint_last.pt alongside the scheduler state.
  • AMP off by default (cuFFT fp16 doesn't support non-power-of-2 seq_len=96); flip training.use_amp=true once your model has no FFT-on-fp16 path.

Documentation

Contributing

See CONTRIBUTING.md. New model contributions especially welcome — the bar is one model file, one config, and one test.

License

Apache-2.0. See LICENSE.

Citing

If you use Sibyl in academic work, please cite via CITATION.cff.

Acknowledgements

  • Time-Series-Library (THUML) for the reference implementations of the 10 earlier forecasting models.
  • The SOFTS, TimeFilter, and DUET authors for releasing their official codebases.
  • The Hydra and PyTorch communities for the foundations the platform sits on.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sibyl_ml-0.1.3-py3-none-any.whl (220.8 kB view details)

Uploaded Python 3

File details

Details for the file sibyl_ml-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: sibyl_ml-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 220.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for sibyl_ml-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0ec5025e7a26a6be6598f7d262c294864d971a6404da2dd59f5ba3c58156b52a
MD5 9b1c9a1474d44aa8bcdba6e1249df91d
BLAKE2b-256 f7de323c5e7100e7816aac6a03c5c2eaaf463a3b65e840061a732cddb7ce20ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page