Skip to main content

High-performance transit data parser with TXC to GTFS conversion

Project description

Transit Parser

High-performance Python+Rust library for parsing transit data formats with TXC to GTFS conversion.

Features

  • GTFS Static - Parse and write GTFS feeds (CSV-based)
  • TransXChange (TXC) - Parse UK XML transit format
  • TXC to GTFS - Convert TransXChange to GTFS
  • Schedule Validation - Validate operational schedules against GTFS
  • Deadhead Inference - Infer missing pull-out, pull-in, and interlining movements
  • Generic CSV/JSON - Parse any CSV/JSON with schema inference

Installation

Prerequisites

  • Python 3.9+
  • Rust 1.75+ (with cargo)
  • uv (recommended) or pip

Development Setup

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and enter directory
cd parser

# Create virtual environment and install in dev mode
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Build and install with maturin
uv pip install maturin
maturin develop

# Or use pip directly
pip install maturin
maturin develop

Building for Release

maturin build --release

Usage

Parse GTFS Feed

from transit_parser import GtfsFeed

# From ZIP file
feed = GtfsFeed.from_zip("path/to/gtfs.zip")

# From directory
feed = GtfsFeed.from_path("path/to/gtfs/")

# Access data
print(f"Agencies: {len(feed.agencies)}")
print(f"Routes: {len(feed.routes)}")
print(f"Stops: {len(feed.stops)}")
print(f"Trips: {len(feed.trips)}")

# Write to ZIP
feed.to_zip("output.zip")

Parse TransXChange

from transit_parser import TxcDocument

# From file
doc = TxcDocument.from_path("path/to/file.xml")

# From string
doc = TxcDocument.from_string(xml_string)

# Inspect document
print(f"Schema version: {doc.schema_version}")
print(f"Operators: {doc.operator_count}")
print(f"Services: {doc.service_count}")
print(f"Vehicle journeys: {doc.vehicle_journey_count}")

Convert TXC to GTFS

from transit_parser import TxcDocument, TxcToGtfsConverter, ConversionOptions

# Parse TXC
doc = TxcDocument.from_path("input.xml")

# Configure conversion
options = ConversionOptions(
    include_shapes=True,
    region="england",  # For bank holiday handling
    calendar_start="2024-01-01",
    calendar_end="2024-12-31",
)

# Convert
converter = TxcToGtfsConverter(options)
result = converter.convert(doc)

# Check results
print(f"Converted {result.stats.trips_converted} trips")
print(f"Warnings: {len(result.warnings)}")

# Save GTFS
result.feed.to_zip("output.zip")

Batch Conversion

from pathlib import Path
from transit_parser import TxcDocument, TxcToGtfsConverter

# Parse multiple TXC files
docs = []
for xml_file in Path("txc_files/").glob("*.xml"):
    docs.append(TxcDocument.from_path(str(xml_file)))

# Convert all to single GTFS
converter = TxcToGtfsConverter()
result = converter.convert_batch(docs)
result.feed.to_zip("combined.zip")

Generic CSV Parsing

from transit_parser import CsvDocument

# Parse with automatic type inference
doc = CsvDocument.from_path("data.csv")

print(f"Columns: {doc.columns}")
print(f"Rows: {len(doc)}")

# Access rows as dicts
for row in doc.rows:
    print(row)

JSON Parsing

from transit_parser import JsonDocument

# Parse JSON
doc = JsonDocument.from_path("data.json")

# Access root value
data = doc.root

# Use JSON pointer for nested access
value = doc.pointer("/data/items/0/name")

Schedule Validation

from transit_parser import GtfsFeed, Schedule, ValidationConfig

# Load GTFS and schedule
gtfs = GtfsFeed.from_path("gtfs/")
schedule = Schedule.from_csv("schedule.csv")

# Validate with custom rules
config = ValidationConfig(
    gtfs_compliance="standard",
    min_layover_seconds=300,
    max_duty_length_seconds=32400,
)
result = schedule.validate(gtfs, config)

if not result.is_valid:
    for error in result.errors:
        print(f"Error: {error['message']}")

# Infer missing deadheads
inference = schedule.infer_deadheads(gtfs, default_depot="MAIN")
print(f"Inferred {inference.total_count} deadheads")

# Export
schedule.to_csv("output.csv", preset="optibus")

Project Structure

parser/
├── pyproject.toml          # Python project config (maturin backend)
├── Cargo.toml              # Rust workspace root
├── rust/
│   ├── transit-core/       # Core data models and traits
│   ├── gtfs-parser/        # GTFS Static parser
│   ├── txc-parser/         # TransXChange parser
│   ├── txc-gtfs-adapter/   # TXC→GTFS conversion
│   ├── schedule-parser/    # Schedule validation & generation
│   ├── csv-parser/         # Generic CSV parser
│   ├── json-parser/        # Generic JSON parser
│   └── transit-bindings/   # PyO3 Python bindings
└── python/
    └── transit_parser/     # Python package

Performance

The Rust core provides high performance for:

  • Streaming XML parsing - Process large TXC files without loading entire DOM
  • Zero-copy CSV parsing - Efficient GTFS file reading
  • Parallel processing - Batch conversion uses multiple cores
  • GIL release - Python can do other work during long operations

License

MIT OR Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transit_parser-0.2.0.tar.gz (98.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

transit_parser-0.2.0-cp39-abi3-win_amd64.whl (1.3 MB view details)

Uploaded CPython 3.9+Windows x86-64

transit_parser-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

transit_parser-0.2.0-cp39-abi3-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file transit_parser-0.2.0.tar.gz.

File metadata

  • Download URL: transit_parser-0.2.0.tar.gz
  • Upload date:
  • Size: 98.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for transit_parser-0.2.0.tar.gz
Algorithm Hash digest
SHA256 de40465b92ccbe0533be70c5044f59386cde1f898fe8e6867591b4359d125633
MD5 d807b95dd4c3f998178150d47044e013
BLAKE2b-256 f51700ba1f63051f078702066bcbdd7856df7d65d919c152754cd10a91fea2c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for transit_parser-0.2.0.tar.gz:

Publisher: publish.yml on alexogeny/transit-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file transit_parser-0.2.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for transit_parser-0.2.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7b3645efee7732d3d4580f9860bb4878069700f14c52344baf592de67f36ba27
MD5 5b79acdd787aa2eb2112e6554a11c3c9
BLAKE2b-256 db56b476a0661cb276ce9ca6859bb7f660b0b1dfa2e8121a68f5ccdaa6518044

See more details on using hashes here.

Provenance

The following attestation bundles were made for transit_parser-0.2.0-cp39-abi3-win_amd64.whl:

Publisher: publish.yml on alexogeny/transit-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file transit_parser-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for transit_parser-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8b0cb6aeaee0a5c3b27d83c07a80c7b7a02051c63a609d3bb98d0352d87550dd
MD5 ee5e2632cdb82110c7435eac8057d080
BLAKE2b-256 18abbacdd84a27d4234cf79c42fadc6cd843cfd7140a0d341843d47f725ee5f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for transit_parser-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on alexogeny/transit-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file transit_parser-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for transit_parser-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5c6dca0bf3832b63dc42a27a9f4c203748adde447a5906f194c20ca6c68708fd
MD5 8aeb77fa2f6d0c88364fbd2b497b2ae6
BLAKE2b-256 fa89b810e8f465949bcaebc5b80532a9277abaa60d6d5ca6da327a15bc4725cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for transit_parser-0.2.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on alexogeny/transit-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page