High-performance transit data parser with TXC to GTFS conversion
Project description
Transit Parser
High-performance Python+Rust library for parsing transit data formats with TXC to GTFS conversion.
Features
- GTFS Static - Parse and write GTFS feeds (CSV-based)
- TransXChange (TXC) - Parse UK XML transit format
- TXC to GTFS - Convert TransXChange to GTFS
- Schedule Validation - Validate operational schedules against GTFS
- Deadhead Inference - Infer missing pull-out, pull-in, and interlining movements
- Generic CSV/JSON - Parse any CSV/JSON with schema inference
Installation
Prerequisites
- Python 3.9+
- Rust 1.75+ (with cargo)
- uv (recommended) or pip
Development Setup
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and enter directory
cd parser
# Create virtual environment and install in dev mode
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Build and install with maturin
uv pip install maturin
maturin develop
# Or use pip directly
pip install maturin
maturin develop
Building for Release
maturin build --release
Usage
Parse GTFS Feed
from transit_parser import GtfsFeed
# From ZIP file
feed = GtfsFeed.from_zip("path/to/gtfs.zip")
# From directory
feed = GtfsFeed.from_path("path/to/gtfs/")
# Access data
print(f"Agencies: {len(feed.agencies)}")
print(f"Routes: {len(feed.routes)}")
print(f"Stops: {len(feed.stops)}")
print(f"Trips: {len(feed.trips)}")
# Write to ZIP
feed.to_zip("output.zip")
Parse TransXChange
from transit_parser import TxcDocument
# From file
doc = TxcDocument.from_path("path/to/file.xml")
# From string
doc = TxcDocument.from_string(xml_string)
# Inspect document
print(f"Schema version: {doc.schema_version}")
print(f"Operators: {doc.operator_count}")
print(f"Services: {doc.service_count}")
print(f"Vehicle journeys: {doc.vehicle_journey_count}")
Convert TXC to GTFS
from transit_parser import TxcDocument, TxcToGtfsConverter, ConversionOptions
# Parse TXC
doc = TxcDocument.from_path("input.xml")
# Configure conversion
options = ConversionOptions(
include_shapes=True,
region="england", # For bank holiday handling
calendar_start="2024-01-01",
calendar_end="2024-12-31",
)
# Convert
converter = TxcToGtfsConverter(options)
result = converter.convert(doc)
# Check results
print(f"Converted {result.stats.trips_converted} trips")
print(f"Warnings: {len(result.warnings)}")
# Save GTFS
result.feed.to_zip("output.zip")
Batch Conversion
from pathlib import Path
from transit_parser import TxcDocument, TxcToGtfsConverter
# Parse multiple TXC files
docs = []
for xml_file in Path("txc_files/").glob("*.xml"):
docs.append(TxcDocument.from_path(str(xml_file)))
# Convert all to single GTFS
converter = TxcToGtfsConverter()
result = converter.convert_batch(docs)
result.feed.to_zip("combined.zip")
Generic CSV Parsing
from transit_parser import CsvDocument
# Parse with automatic type inference
doc = CsvDocument.from_path("data.csv")
print(f"Columns: {doc.columns}")
print(f"Rows: {len(doc)}")
# Access rows as dicts
for row in doc.rows:
print(row)
JSON Parsing
from transit_parser import JsonDocument
# Parse JSON
doc = JsonDocument.from_path("data.json")
# Access root value
data = doc.root
# Use JSON pointer for nested access
value = doc.pointer("/data/items/0/name")
Schedule Validation
from transit_parser import GtfsFeed, Schedule, ValidationConfig
# Load GTFS and schedule
gtfs = GtfsFeed.from_path("gtfs/")
schedule = Schedule.from_csv("schedule.csv")
# Validate with custom rules
config = ValidationConfig(
gtfs_compliance="standard",
min_layover_seconds=300,
max_duty_length_seconds=32400,
)
result = schedule.validate(gtfs, config)
if not result.is_valid:
for error in result.errors:
print(f"Error: {error['message']}")
# Infer missing deadheads
inference = schedule.infer_deadheads(gtfs, default_depot="MAIN")
print(f"Inferred {inference.total_count} deadheads")
# Export
schedule.to_csv("output.csv", preset="optibus")
Project Structure
parser/
├── pyproject.toml # Python project config (maturin backend)
├── Cargo.toml # Rust workspace root
├── rust/
│ ├── transit-core/ # Core data models and traits
│ ├── gtfs-parser/ # GTFS Static parser
│ ├── txc-parser/ # TransXChange parser
│ ├── txc-gtfs-adapter/ # TXC→GTFS conversion
│ ├── schedule-parser/ # Schedule validation & generation
│ ├── csv-parser/ # Generic CSV parser
│ ├── json-parser/ # Generic JSON parser
│ └── transit-bindings/ # PyO3 Python bindings
└── python/
└── transit_parser/ # Python package
Performance
The Rust core provides high performance for:
- Streaming XML parsing - Process large TXC files without loading entire DOM
- Zero-copy CSV parsing - Efficient GTFS file reading
- Parallel processing - Batch conversion uses multiple cores
- GIL release - Python can do other work during long operations
License
MIT OR Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transit_parser-0.2.0.tar.gz.
File metadata
- Download URL: transit_parser-0.2.0.tar.gz
- Upload date:
- Size: 98.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de40465b92ccbe0533be70c5044f59386cde1f898fe8e6867591b4359d125633
|
|
| MD5 |
d807b95dd4c3f998178150d47044e013
|
|
| BLAKE2b-256 |
f51700ba1f63051f078702066bcbdd7856df7d65d919c152754cd10a91fea2c9
|
Provenance
The following attestation bundles were made for transit_parser-0.2.0.tar.gz:
Publisher:
publish.yml on alexogeny/transit-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
transit_parser-0.2.0.tar.gz -
Subject digest:
de40465b92ccbe0533be70c5044f59386cde1f898fe8e6867591b4359d125633 - Sigstore transparency entry: 916236281
- Sigstore integration time:
-
Permalink:
alexogeny/transit-parser@ae34491174538668eb7920c6e1eacd18028374e1 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/alexogeny
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae34491174538668eb7920c6e1eacd18028374e1 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file transit_parser-0.2.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: transit_parser-0.2.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b3645efee7732d3d4580f9860bb4878069700f14c52344baf592de67f36ba27
|
|
| MD5 |
5b79acdd787aa2eb2112e6554a11c3c9
|
|
| BLAKE2b-256 |
db56b476a0661cb276ce9ca6859bb7f660b0b1dfa2e8121a68f5ccdaa6518044
|
Provenance
The following attestation bundles were made for transit_parser-0.2.0-cp39-abi3-win_amd64.whl:
Publisher:
publish.yml on alexogeny/transit-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
transit_parser-0.2.0-cp39-abi3-win_amd64.whl -
Subject digest:
7b3645efee7732d3d4580f9860bb4878069700f14c52344baf592de67f36ba27 - Sigstore transparency entry: 916236708
- Sigstore integration time:
-
Permalink:
alexogeny/transit-parser@ae34491174538668eb7920c6e1eacd18028374e1 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/alexogeny
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae34491174538668eb7920c6e1eacd18028374e1 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file transit_parser-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: transit_parser-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b0cb6aeaee0a5c3b27d83c07a80c7b7a02051c63a609d3bb98d0352d87550dd
|
|
| MD5 |
ee5e2632cdb82110c7435eac8057d080
|
|
| BLAKE2b-256 |
18abbacdd84a27d4234cf79c42fadc6cd843cfd7140a0d341843d47f725ee5f7
|
Provenance
The following attestation bundles were made for transit_parser-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on alexogeny/transit-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
transit_parser-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
8b0cb6aeaee0a5c3b27d83c07a80c7b7a02051c63a609d3bb98d0352d87550dd - Sigstore transparency entry: 916236589
- Sigstore integration time:
-
Permalink:
alexogeny/transit-parser@ae34491174538668eb7920c6e1eacd18028374e1 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/alexogeny
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae34491174538668eb7920c6e1eacd18028374e1 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file transit_parser-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: transit_parser-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c6dca0bf3832b63dc42a27a9f4c203748adde447a5906f194c20ca6c68708fd
|
|
| MD5 |
8aeb77fa2f6d0c88364fbd2b497b2ae6
|
|
| BLAKE2b-256 |
fa89b810e8f465949bcaebc5b80532a9277abaa60d6d5ca6da327a15bc4725cd
|
Provenance
The following attestation bundles were made for transit_parser-0.2.0-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
publish.yml on alexogeny/transit-parser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
transit_parser-0.2.0-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
5c6dca0bf3832b63dc42a27a9f4c203748adde447a5906f194c20ca6c68708fd - Sigstore transparency entry: 916236457
- Sigstore integration time:
-
Permalink:
alexogeny/transit-parser@ae34491174538668eb7920c6e1eacd18028374e1 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/alexogeny
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae34491174538668eb7920c6e1eacd18028374e1 -
Trigger Event:
workflow_dispatch
-
Statement type: