Skip to main content

Bayesian RNA-seq transcript quantification tool

Project description

Rigel

Bayesian RNA-seq quantification with joint mRNA, nascent RNA, and genomic DNA modeling

CI PyPI Bioconda Python License


Overview

Rigel Overview

Rigel quantifies RNA-seq alignments while explicitly modeling three sources of signal in the same library:

  • Mature RNA (mRNA)
  • Nascent RNA (nRNA)
  • Genomic DNA contamination (gDNA)

The implementation is built around a single-pass native BAM scan plus a locus-level EM solver. A key architectural change in the current codebase is that nRNA is no longer represented as one shadow per transcript. Instead, Rigel builds a global table of unique nRNA spans keyed by (ref, strand, start, end) and shares each nRNA component across transcripts with the same genomic span. This reduces redundant nRNA states in loci with many isoforms that start and end at the same coordinates.

Key features

  • Joint mRNA, nRNA, and gDNA quantification in one locus-level model
  • Shared-span nRNA architecture with one component per unique genomic span (ref, strand, start, end)
  • Single-pass C++ BAM scanner using htslib, with memory-bounded buffering and spill-to-disk support
  • Automatic strand-model training from annotated spliced fragments; protocol auto-detection (R1-sense / R1-antisense)
  • Aggregate-first gDNA calibration using density, strand balance (Beta-Binomial), and fragment-length signals
  • Empirical Bayes priors for nRNA fractions and gDNA rates; calibrated per-locus gDNA initialization
  • MAP-EM and Variational Bayes EM (VBEM, default) solver modes with SQUAREM acceleration
  • Discrete fragment assignment: fractional, map, or sample (default) post-EM assignment modes
  • Parallel BAM scanning and parallel locus EM controlled through one --threads setting
  • Feather and TSV outputs plus optional annotated BAM output with per-fragment assignment tags

Installation

Bioconda

conda install -c conda-forge -c bioconda rigel

PyPI

pip install rigel-rnaseq

The PyPI package name is rigel-rnaseq because rigel is already taken on PyPI. The import name and CLI stay rigel.

From source

git clone https://github.com/mkiyer/rigel.git
cd rigel

mamba env create -f mamba_env.yaml
conda activate rigel

pip install --no-build-isolation -e .

Requirements

  • Python 3.12+
  • C++17-capable compiler
  • Runtime dependencies: numpy, pandas, pyarrow, pysam, pyyaml

On macOS, install Xcode Command Line Tools first:

xcode-select --install

Quick start

1. Build an index

rigel index \
    --fasta genome.fa \
    --gtf annotation.gtf \
    -o index/

The FASTA must have a .fai index. If needed:

samtools faidx genome.fa

2. Quantify a BAM

rigel quant \
    --bam sample.bam \
    --index index/ \
    -o results/

Input BAM requirements:

  • Name-sorted or collated
  • NH tag present for multimapper handling
  • Splice-junction strand tag available for best strand-model training (XS or ts, or let Rigel auto-detect)

3. Inspect outputs

head results/quant.tsv
head results/gene_quant.tsv
head results/nrna_quant.tsv
head results/loci.tsv
cat results/summary.json

Output files

File Description
quant.feather / quant.tsv Transcript-level abundance table with mrna, nrna, rna_total, tpm, and QC columns
gene_quant.feather / gene_quant.tsv Gene-level aggregates derived from transcript estimates
nrna_quant.feather / nrna_quant.tsv nRNA-span-level abundance estimates (one row per unique genomic nRNA span)
loci.feather / loci.tsv Per-locus EM summary with mrna, nrna, gdna, and gdna_init
summary.json Library protocol, strand specificity, fragment-length histograms, calibration results, alignment counts, and global quantification totals
config.yaml Resolved run configuration (parameters, I/O paths). Rerun with rigel quant --config config.yaml
annotated.bam Optional annotated BAM with ZT, ZG, ZR, ZI, ZJ, ZF, ZW, ZC, ZH, ZN, ZS, ZL, ZB tags. Rigel guarantees a collated-in → collated-out contract: the output contains exactly the same records as the input (no drops, no duplications).

The nrna values in transcript- and gene-level tables are derived from shared nRNA-span counts that are pro-rated across transcripts sharing the same span.


How it works

Rigel runs in two logical stages.

Architecture

 FASTA + GTF ──▶ Index Build (index.py) ──▶ Feather index files
                                                    │
 BAM file ──────────────────────────────────────────┤
                                                    ▼
                              ┌──────────────────────────────────┐
                              │  Stage 1: BAM Scan & Training    │
                              │  C++: BamScanner → Resolver      │
                              │  Py:  buffer.py, strand_model.py │
                              └──────────────┬───────────────────┘
                                             │ FragmentBuffer + models
                                             ▼
                              ┌──────────────────────────────────┐
                              │  Stage 2: Score & Route          │
                              │  C++: fused_score_buffer         │
                              │  Py:  scan.py → ScoredFragments  │
                              └──────────────┬───────────────────┘
                                             │ CSR arrays
                                             ▼
                              ┌──────────────────────────────────┐
                              │  Stage 3: gDNA Calibration       │
                              │  Py:  calibration.py, locus.py   │
                              └──────────────┬───────────────────┘
                                             │ per-locus γ (gDNA fraction)
                                             ▼
                              ┌──────────────────────────────────┐
                              │  Stage 4: Locus-Level EM         │
                              │  C++: batch_locus_em (SQUAREM)   │
                              │  Py:  estimator.py dispatch      │
                              └──────────────┬───────────────────┘
                                             │ posterior counts
                                             ▼
                              ┌──────────────────────────────────┐
                              │  Stage 5: Output                 │
                              │  Py:  cli.py → Feather/TSV/JSON  │
                              └──────────────────────────────────┘

BAM scan and model training

A native scanner reads the BAM once, resolves fragments against the indexed annotation, classifies splice structure, trains strand and fragment-length models, and writes resolved fragment data into a columnar buffer.

The main strand model is trained from annotated spliced fragments with unambiguous gene assignment. Diagnostic exonic and intergenic strand models are also retained for reporting, but gDNA itself is always scored with strand probability 0.5.

gDNA calibration

Before per-locus EM, Rigel runs an aggregate-first calibration pass over genomic regions. Each region is classified using three signals: fragment density (Gaussian mixture), strand balance (Beta-Binomial with shared κ), and fragment length (separate RNA and gDNA models). Any region with spliced fragments is forced to zero gDNA probability. The resulting per-region posteriors (γ) are fragment-count-weighted to the locus level and used to initialize the gDNA component in the EM.

Locus-level EM

Ambiguous fragments are routed into CSR form and grouped into connected components of overlapping transcripts. For a locus with T transcripts and N unique nRNA spans, Rigel solves a T + N + 1 component problem:

  • T mRNA components
  • N shared nRNA components
  • 1 merged gDNA component for the locus

The solver runs VBEM (default) or MAP-EM with SQUAREM acceleration. A tripartite prior (coverage-weighted OVR for mRNA, sparsifying Dirichlet for nRNA, calibrated γ for gDNA) is applied. Post-EM fragments are assigned using the configured assignment mode (sample by default).


Documentation

Document Description
docs/MANUAL.md CLI reference, parameter defaults, configuration rules, and output schema
docs/METHODS.md Algorithmic description of the implemented model and priors
docs/PUBLISHING.md Release workflow for PyPI and Bioconda
docs/PARAMETERS.md Complete parameter reference with defaults and config dataclass mapping

Citing Rigel

If you use Rigel in research, cite the repository for now:

Iyer MK. Rigel: Bayesian RNA-seq quantification with joint mRNA, nascent RNA, and genomic DNA modeling. 2026. https://github.com/mkiyer/rigel


License

Rigel is distributed under the GNU General Public License v3.0.


Development

pytest tests/ -v
pytest tests/ --cov=rigel --cov-report=term-missing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rigel_rnaseq-0.4.0.tar.gz (50.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.28+ x86-64

rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_aarch64.whl (7.0 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.28+ ARM64

rigel_rnaseq-0.4.0-cp312-abi3-macosx_15_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12+macOS 15.0+ ARM64

File details

Details for the file rigel_rnaseq-0.4.0.tar.gz.

File metadata

  • Download URL: rigel_rnaseq-0.4.0.tar.gz
  • Upload date:
  • Size: 50.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rigel_rnaseq-0.4.0.tar.gz
Algorithm Hash digest
SHA256 ccc96add5b65e0b600aaf167bc7112c104bede20440b5f7d1094fbea017eb83f
MD5 5bd41dfd5546565ab5bfff7e2ed917de
BLAKE2b-256 88071d3fa6f34ffde961cc30839f0ca13e91472036d8398c9b9fe3cb91320dba

See more details on using hashes here.

Provenance

The following attestation bundles were made for rigel_rnaseq-0.4.0.tar.gz:

Publisher: publish.yml on mkiyer/rigel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 909c41a519077a4c6b916c7848949aadcf722ad361dc2e239bbe37b54eccbb43
MD5 78545e82e6aa9ccde50b6c0a905088b1
BLAKE2b-256 0ee96ec231e6df53a516394939c09f00d6998e533bdcde86eb1743a66c35f2a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mkiyer/rigel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d0d845e553182ab699819adabaffb8290a857942d2149687d3134a52f38af6e5
MD5 46eb0d58dbced4d680bafd94b68b8a6e
BLAKE2b-256 c4d969afad334b43cc3dd492f9f7a5baa311e85b6d4a6833b79c309feb92fe54

See more details on using hashes here.

Provenance

The following attestation bundles were made for rigel_rnaseq-0.4.0-cp312-abi3-manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mkiyer/rigel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rigel_rnaseq-0.4.0-cp312-abi3-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for rigel_rnaseq-0.4.0-cp312-abi3-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 485f89a06ef0e04905468b53ae89ed4a8629d5dd9b73fb3277c356c72596832a
MD5 0533f60231e6480a1da81187cd7565b6
BLAKE2b-256 250f6ffa9affbdca50dfbcc7009ad9db62609af8eb7333e809ef251b03344686

See more details on using hashes here.

Provenance

The following attestation bundles were made for rigel_rnaseq-0.4.0-cp312-abi3-macosx_15_0_arm64.whl:

Publisher: publish.yml on mkiyer/rigel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page