Bayesian RNA-seq transcript quantification tool
Project description
Bayesian RNA-seq quantification with joint mRNA, nascent RNA, and genomic DNA modeling
Overview
Rigel quantifies RNA-seq alignments while explicitly modeling three sources of signal in the same library:
- Mature RNA (mRNA)
- Nascent RNA (nRNA)
- Genomic DNA contamination (gDNA)
The implementation is built around a single-pass native BAM scan plus a
locus-level EM solver. A key architectural change in the current codebase is
that nRNA is no longer represented as one shadow per transcript. Instead,
Rigel builds a global table of unique nRNA spans keyed by (ref, strand, start, end) and shares each nRNA component across transcripts with the same
genomic span. This reduces redundant nRNA states in loci with many isoforms
that start and end at the same coordinates.
Key features
- Joint mRNA, nRNA, and gDNA quantification in one locus-level model
- Shared-span nRNA architecture with one component per unique genomic span
(ref, strand, start, end) - Single-pass C++ BAM scanner using htslib, with memory-bounded buffering and spill-to-disk support
- Automatic strand-model training from annotated spliced fragments; protocol auto-detection (
R1-sense/R1-antisense) - Aggregate-first gDNA calibration using density, strand balance (Beta-Binomial), and fragment-length signals
- Empirical Bayes priors for nRNA fractions and gDNA rates; calibrated per-locus gDNA initialization
- MAP-EM and Variational Bayes EM (VBEM, default) solver modes with SQUAREM acceleration
- Discrete fragment assignment:
fractional,map, orsample(default) post-EM assignment modes - Parallel BAM scanning and parallel locus EM controlled through one
--threadssetting - Feather and TSV outputs plus optional annotated BAM output with per-fragment assignment tags
Installation
Bioconda
conda install -c conda-forge -c bioconda rigel
PyPI
pip install rigel-rnaseq
The PyPI package name is rigel-rnaseq because rigel is already taken on
PyPI. The import name and CLI stay rigel.
From source
git clone https://github.com/mkiyer/rigel.git
cd rigel
mamba env create -f mamba_env.yaml
conda activate rigel
pip install --no-build-isolation -e .
Requirements
- Python 3.12+
- C++17-capable compiler
- Runtime dependencies:
numpy,pandas,pyarrow,pysam,pyyaml
On macOS, install Xcode Command Line Tools first:
xcode-select --install
Quick start
1. Build an index
rigel index \
--fasta genome.fa \
--gtf annotation.gtf \
-o index/
The FASTA must have a .fai index. If needed:
samtools faidx genome.fa
2. Quantify a BAM
rigel quant \
--bam sample.bam \
--index index/ \
-o results/
Input BAM requirements:
- Name-sorted or collated
NHtag present for multimapper handling- Splice-junction strand tag available for best strand-model training (
XSorts, or let Rigel auto-detect)
3. Inspect outputs
head results/quant.tsv
head results/gene_quant.tsv
head results/nrna_quant.tsv
head results/loci.tsv
cat results/summary.json
Output files
| File | Description |
|---|---|
quant.feather / quant.tsv |
Transcript-level abundance table with mrna, nrna, rna_total, tpm, and QC columns |
gene_quant.feather / gene_quant.tsv |
Gene-level aggregates derived from transcript estimates |
nrna_quant.feather / nrna_quant.tsv |
nRNA-span-level abundance estimates (one row per unique genomic nRNA span) |
loci.feather / loci.tsv |
Per-locus EM summary with mrna, nrna, gdna, and gdna_init |
summary.json |
Library protocol, strand specificity, fragment-length histograms, calibration results, alignment counts, and global quantification totals |
config.yaml |
Resolved run configuration (parameters, I/O paths). Rerun with rigel quant --config config.yaml |
annotated.bam |
Optional annotated BAM with ZT, ZG, ZR, ZI, ZJ, ZF, ZW, ZC, ZH, ZN, ZS, ZL, ZB tags. Rigel guarantees a collated-in → collated-out contract: the output contains exactly the same records as the input (no drops, no duplications). |
The nrna values in transcript- and gene-level tables are derived from shared
nRNA-span counts that are pro-rated across transcripts sharing the same span.
How it works
Rigel runs in two logical stages.
Architecture
FASTA + GTF ──▶ Index Build (index.py) ──▶ Feather index files
│
BAM file ──────────────────────────────────────────┤
▼
┌──────────────────────────────────┐
│ Stage 1: BAM Scan & Training │
│ C++: BamScanner → Resolver │
│ Py: buffer.py, strand_model.py │
└──────────────┬───────────────────┘
│ FragmentBuffer + models
▼
┌──────────────────────────────────┐
│ Stage 2: Score & Route │
│ C++: fused_score_buffer │
│ Py: scan.py → ScoredFragments │
└──────────────┬───────────────────┘
│ CSR arrays
▼
┌──────────────────────────────────┐
│ Stage 3: gDNA Calibration │
│ Py: calibration.py, locus.py │
└──────────────┬───────────────────┘
│ per-locus γ (gDNA fraction)
▼
┌──────────────────────────────────┐
│ Stage 4: Locus-Level EM │
│ C++: batch_locus_em (SQUAREM) │
│ Py: estimator.py dispatch │
└──────────────┬───────────────────┘
│ posterior counts
▼
┌──────────────────────────────────┐
│ Stage 5: Output │
│ Py: cli.py → Feather/TSV/JSON │
└──────────────────────────────────┘
BAM scan and model training
A native scanner reads the BAM once, resolves fragments against the indexed annotation, classifies splice structure, trains strand and fragment-length models, and writes resolved fragment data into a columnar buffer.
The main strand model is trained from annotated spliced fragments with
unambiguous gene assignment. Diagnostic exonic and intergenic strand models are
also retained for reporting, but gDNA itself is always scored with strand
probability 0.5.
gDNA calibration
Before per-locus EM, Rigel runs an aggregate-first calibration pass over genomic regions. Each region is classified using three signals: fragment density (Gaussian mixture), strand balance (Beta-Binomial with shared κ), and fragment length (separate RNA and gDNA models). Any region with spliced fragments is forced to zero gDNA probability. The resulting per-region posteriors (γ) are fragment-count-weighted to the locus level and used to initialize the gDNA component in the EM.
Locus-level EM
Ambiguous fragments are routed into CSR form and grouped into connected
components of overlapping transcripts. For a locus with T transcripts and N
unique nRNA spans, Rigel solves a T + N + 1 component problem:
TmRNA componentsNshared nRNA components1merged gDNA component for the locus
The solver runs VBEM (default) or MAP-EM with SQUAREM acceleration. A
tripartite prior (coverage-weighted OVR for mRNA, sparsifying Dirichlet for
nRNA, calibrated γ for gDNA) is applied. Post-EM fragments are assigned using
the configured assignment mode (sample by default).
Documentation
| Document | Description |
|---|---|
| docs/MANUAL.md | CLI reference, parameter defaults, configuration rules, and output schema |
| docs/METHODS.md | Algorithmic description of the implemented model and priors |
| docs/PUBLISHING.md | Release workflow for PyPI and Bioconda |
| docs/PARAMETERS.md | Complete parameter reference with defaults and config dataclass mapping |
Citing Rigel
If you use Rigel in research, cite the repository for now:
Iyer MK. Rigel: Bayesian RNA-seq quantification with joint mRNA, nascent RNA, and genomic DNA modeling. 2026. https://github.com/mkiyer/rigel
License
Rigel is distributed under the GNU General Public License v3.0.
Development
pytest tests/ -v
pytest tests/ --cov=rigel --cov-report=term-missing
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rigel_rnaseq-0.4.0.tar.gz.
File metadata
- Download URL: rigel_rnaseq-0.4.0.tar.gz
- Upload date:
- Size: 50.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccc96add5b65e0b600aaf167bc7112c104bede20440b5f7d1094fbea017eb83f
|
|
| MD5 |
5bd41dfd5546565ab5bfff7e2ed917de
|
|
| BLAKE2b-256 |
88071d3fa6f34ffde961cc30839f0ca13e91472036d8398c9b9fe3cb91320dba
|
Provenance
The following attestation bundles were made for rigel_rnaseq-0.4.0.tar.gz:
Publisher:
publish.yml on mkiyer/rigel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rigel_rnaseq-0.4.0.tar.gz -
Subject digest:
ccc96add5b65e0b600aaf167bc7112c104bede20440b5f7d1094fbea017eb83f - Sigstore transparency entry: 1351372835
- Sigstore integration time:
-
Permalink:
mkiyer/rigel@9f58eaae06db839fe1f52a3699a0d881fa03ffcf -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/mkiyer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f58eaae06db839fe1f52a3699a0d881fa03ffcf -
Trigger Event:
push
-
Statement type:
File details
Details for the file rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 7.3 MB
- Tags: CPython 3.12+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
909c41a519077a4c6b916c7848949aadcf722ad361dc2e239bbe37b54eccbb43
|
|
| MD5 |
78545e82e6aa9ccde50b6c0a905088b1
|
|
| BLAKE2b-256 |
0ee96ec231e6df53a516394939c09f00d6998e533bdcde86eb1743a66c35f2a2
|
Provenance
The following attestation bundles were made for rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_x86_64.whl:
Publisher:
publish.yml on mkiyer/rigel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_x86_64.whl -
Subject digest:
909c41a519077a4c6b916c7848949aadcf722ad361dc2e239bbe37b54eccbb43 - Sigstore transparency entry: 1351372982
- Sigstore integration time:
-
Permalink:
mkiyer/rigel@9f58eaae06db839fe1f52a3699a0d881fa03ffcf -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/mkiyer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f58eaae06db839fe1f52a3699a0d881fa03ffcf -
Trigger Event:
push
-
Statement type:
File details
Details for the file rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 7.0 MB
- Tags: CPython 3.12+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0d845e553182ab699819adabaffb8290a857942d2149687d3134a52f38af6e5
|
|
| MD5 |
46eb0d58dbced4d680bafd94b68b8a6e
|
|
| BLAKE2b-256 |
c4d969afad334b43cc3dd492f9f7a5baa311e85b6d4a6833b79c309feb92fe54
|
Provenance
The following attestation bundles were made for rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_aarch64.whl:
Publisher:
publish.yml on mkiyer/rigel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_aarch64.whl -
Subject digest:
d0d845e553182ab699819adabaffb8290a857942d2149687d3134a52f38af6e5 - Sigstore transparency entry: 1351372905
- Sigstore integration time:
-
Permalink:
mkiyer/rigel@9f58eaae06db839fe1f52a3699a0d881fa03ffcf -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/mkiyer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f58eaae06db839fe1f52a3699a0d881fa03ffcf -
Trigger Event:
push
-
Statement type:
File details
Details for the file rigel_rnaseq-0.4.0-cp312-abi3-macosx_15_0_arm64.whl.
File metadata
- Download URL: rigel_rnaseq-0.4.0-cp312-abi3-macosx_15_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12+, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
485f89a06ef0e04905468b53ae89ed4a8629d5dd9b73fb3277c356c72596832a
|
|
| MD5 |
0533f60231e6480a1da81187cd7565b6
|
|
| BLAKE2b-256 |
250f6ffa9affbdca50dfbcc7009ad9db62609af8eb7333e809ef251b03344686
|
Provenance
The following attestation bundles were made for rigel_rnaseq-0.4.0-cp312-abi3-macosx_15_0_arm64.whl:
Publisher:
publish.yml on mkiyer/rigel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rigel_rnaseq-0.4.0-cp312-abi3-macosx_15_0_arm64.whl -
Subject digest:
485f89a06ef0e04905468b53ae89ed4a8629d5dd9b73fb3277c356c72596832a - Sigstore transparency entry: 1351373042
- Sigstore integration time:
-
Permalink:
mkiyer/rigel@9f58eaae06db839fe1f52a3699a0d881fa03ffcf -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/mkiyer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9f58eaae06db839fe1f52a3699a0d881fa03ffcf -
Trigger Event:
push
-
Statement type: