Skip to main content

Python bindings for the VCFX toolkit - comprehensive VCF manipulation and analysis

Project description

VCFX Python Package

Python bindings for the VCFX toolkit - a comprehensive VCF manipulation toolkit with 60+ specialized tools for genomic variant analysis.

Installation

pip install vcfx

For development or building from source:

git clone https://github.com/ieeta-pt/VCFX.git
cd VCFX
pip install -e ./python

Quick Start

import vcfx

# Basic helper functions
text = vcfx.trim("  hello  ")  # Returns "hello"
parts = vcfx.split("A,B,C", ",")  # Returns ["A", "B", "C"]

# Read compressed files
content = vcfx.read_file_maybe_compressed("variants.vcf.gz")

# Get version
version = vcfx.get_version()
print(f"VCFX version: {version}")

Tool Wrappers

VCFX provides Python wrappers for all command-line tools. These wrappers execute the tools and parse their output into structured Python objects:

# Count variants
count = vcfx.variant_counter("input.vcf")
print(f"Total variants: {count}")

# Calculate allele frequencies
freqs = vcfx.allele_freq_calc("input.vcf")
for freq in freqs:
    print(f"Position {freq.Pos}: AF={freq.Allele_Frequency}")

# Check sample concordance
concordance = vcfx.concordance_checker("input.vcf", "SAMPLE1", "SAMPLE2")
for row in concordance:
    print(f"{row.Position}: {row.Concordance}")

Structured Data Types

Many tool wrappers return dataclass objects with typed fields for easy access:

from vcfx.results import AlleleFrequency, VariantClassification

# Allele frequency calculations return AlleleFrequency objects
freqs = vcfx.allele_freq_calc("variants.vcf")
# Access fields with type safety and IDE completion
print(freqs[0].Chromosome)  # str
print(freqs[0].Allele_Frequency)  # float

# Variant classification returns VariantClassification objects
classes = vcfx.variant_classifier("variants.vcf")
print(classes[0].Classification)  # 'SNP', 'INDEL', 'MNV', or 'STRUCTURAL'

Common Workflows

Quality Control Pipeline

import vcfx

# Validate VCF format
validation_report = vcfx.validator("input.vcf")
if "ERROR" in validation_report:
    print("VCF validation failed!")
    
# Detect missing data
missing_flagged = vcfx.missing_detector("input.vcf")

# Check concordance across samples
cross_concordance = vcfx.cross_sample_concordance("input.vcf")
discordant = [r for r in cross_concordance if r.Concordance_Status != 'CONCORDANT']

Filtering and Analysis

# Filter by allele frequency
af_filtered = vcfx.af_subsetter("input.vcf", "0.01-0.1")

# Extract specific samples
sample_vcf = vcfx.sample_extractor("input.vcf", ["SAMPLE1", "SAMPLE2"])

# Calculate Hardy-Weinberg equilibrium
hwe_results = vcfx.hwe_tester("input.vcf")
significant = [r for r in hwe_results if r.HWE_pvalue < 0.05]

Population Genetics

# Infer ancestry
ancestry = vcfx.ancestry_inferrer("samples.vcf", "population_freqs.txt")
for sample in ancestry:
    print(f"{sample.Sample}: {sample.Inferred_Population}")

# Calculate inbreeding coefficients
inbreeding = vcfx.inbreeding_calculator("input.vcf", freq_mode="excludeSample")

Error Handling

Tool wrappers raise subprocess.CalledProcessError if the underlying tool fails:

try:
    result = vcfx.variant_counter("nonexistent.vcf")
except subprocess.CalledProcessError as e:
    print(f"Tool failed with exit code {e.returncode}")
    print(f"Error output: {e.stderr}")

Requirements

  • Python 3.10+
  • For tool wrappers: VCFX command-line tools must be installed and available in PATH
    • Install via conda: conda install -c bioconda vcfx
    • Or build from source (see documentation)

Available Tools

VCFX includes 60+ specialized tools organized into categories:

  • Analysis: allele frequencies, variant classification, HWE testing, LD calculation
  • Filtering: quality filters, population filters, missing data filters
  • Transformation: sample extraction, multiallelic splitting, normalization
  • Quality Control: concordance checking, validation, outlier detection
  • File Management: indexing, compression, merging, splitting
  • Annotation: custom annotation, INFO field processing

Use vcfx.available_tools() to list tools accessible in your environment.

Documentation

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcfx-1.1.4.tar.gz (16.5 kB view details)

Uploaded Source

File details

Details for the file vcfx-1.1.4.tar.gz.

File metadata

  • Download URL: vcfx-1.1.4.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcfx-1.1.4.tar.gz
Algorithm Hash digest
SHA256 1548b3ee08e27880059f759e0dfa88b918fbbc02507f770cf500e72309bd5a96
MD5 2c7dd69249dc5b9a7b5be733b025d197
BLAKE2b-256 e3627360286c2b9e1302980e585283a463f8baa4f7800be25444bfec99406078

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcfx-1.1.4.tar.gz:

Publisher: publish-pypi.yml on jorgeMFS/VCFX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page