CDC pipeline configuration generator for Redpanda Connect
Project description
CDC Pipeline Generator
Generate Redpanda Connect pipeline configurations for Change Data Capture (CDC) workflows.
A CLI-first tool for managing CDC pipelines with automatic Docker dev container setup, supporting both db-per-tenant (one database per customer) and db-shared (single database, multi-tenant) patterns.
โจ Features
- ๐ Zero-config setup:
pip installโcdc initโ ready to develop - ๐ณ Docker dev container: Automatic environment setup with all dependencies
- ๐ Multi-tenant patterns: Support for db-per-tenant and db-shared architectures
- ๐ Template-based generation: Jinja2 templates for flexible pipeline configuration
- โ
CLI-first philosophy: All operations via
cdccommands, no manual YAML editing - ๐ ๏ธ Database integration: Auto-updates docker-compose.yml with database services
๐ฆ Installation
pip install cdc-pipeline-generator
That's it! The cdc command is now available globally.
๐ Quick Start (Recommended Workflow)
โ ๏ธ CLI-First Philosophy: All configuration is managed through
cdccommands. Never edit YAML files manually. The CLI is the sole interface for configuration management.
1. Initialize New Project
# Create project directory
mkdir my-cdc-project
cd my-cdc-project
# Initialize with dev container
cdc init
# โ
Creates docker-compose.yml, Dockerfile.dev, project structure
# โ
Builds dev container with Python, Fish shell, all dependencies
# โ
Prompts to start container automatically
2. Enter Dev Container
docker compose exec dev fish
# Now inside container with cdc commands ready to use
3. Create Server Group (Auto-configures Docker Compose)
# For MSSQL source (db-per-tenant pattern)
cdc manage-server-group --create my-group \
--pattern db-per-tenant \
--source-type mssql \
--extraction-pattern '(?P<customer_id>\w+)_(?P<env>\w+)' \
--host '${MSSQL_HOST}' \
--port 1433 \
--user '${MSSQL_USER}' \
--password '${MSSQL_PASSWORD}'
# โ
Creates server-groups.yaml
# โ
Auto-updates docker-compose.yml with MSSQL + PostgreSQL services
# โ
Adds volume definitions and service dependencies
Or for PostgreSQL source (db-shared pattern):
cdc manage-server-group --create my-group \
--pattern db-shared \
--source-type postgresql \
--extraction-pattern '(?P<customer_id>\w+)' \
--environment-aware \
--host '${POSTGRES_SOURCE_HOST}' \
--port 5432 \
--user '${POSTGRES_SOURCE_USER}' \
--password '${POSTGRES_SOURCE_PASSWORD}'
4. Configure Environment Variables
# Copy example and edit with your credentials
cp .env.example .env
nano .env # or use your preferred editor
Example .env:
# Source Database (MSSQL)
MSSQL_HOST=mssql
MSSQL_PORT=1433
MSSQL_USER=sa
MSSQL_PASSWORD=YourPassword123!
# Target Database (PostgreSQL)
POSTGRES_TARGET_HOST=postgres-target
POSTGRES_TARGET_PORT=5432
POSTGRES_TARGET_USER=postgres
POSTGRES_TARGET_PASSWORD=postgres
POSTGRES_TARGET_DB=cdc_target
5. Start All Services
# Exit container temporarily
exit
# Start databases and dev container
docker compose up -d
# Re-enter dev container
docker compose exec dev fish
6. Create Service and Add Tables
# Create service
cdc manage-service --create my-service --server-group my-group
# Add tables to track
cdc manage-service --service my-service --add-table Users --primary-key id
cdc manage-service --service my-service --add-table Orders --primary-key order_id
# Inspect available tables (optional)
cdc manage-service --service my-service --inspect --schema dbo
7. Update Server Group (Populate Databases)
# Inspect source database and populate server-groups.yaml
cdc manage-server-group --update
# โ
Auto-discovers databases
# โ
Maps databases to environments (dev/stage/prod)
# โ
Populates table counts and statistics
8. Generate CDC Pipelines
# Generate pipelines for development environment
cdc generate --service my-service --environment dev
# Check generated files
ls generated/pipelines/
ls generated/schemas/
9. Deploy Pipelines
Generated pipeline files in generated/pipelines/ are ready to deploy to your Redpanda Connect infrastructure.
๐ Complete Command Reference
๐ Complete Command Reference
Project Initialization
cdc init # Initialize new CDC project with dev container
Service Management
# Create service
cdc manage-service --create <name> --server-group <group-name>
# Add tables
cdc manage-service --service <name> --add-table <TableName> --primary-key <column>
# Remove tables
cdc manage-service --service <name> --remove-table <TableName>
# Inspect database schema
cdc manage-service --service <name> --inspect --schema <schema-name>
Server Group Management
# Create server group (auto-updates docker-compose.yml)
cdc manage-server-group --create <name> \
--pattern <db-per-tenant|db-shared> \
--source-type <mssql|postgresql> \
--extraction-pattern '<regex>' \
[--environment-aware] # Required for db-shared
# Update from database inspection
cdc manage-server-group --update
# Show server group info
cdc manage-server-group --info
# List all server groups
cdc manage-server-group --list
Pipeline Generation
# Generate for specific service
cdc generate --service <name> --environment <dev|stage|prod>
# Generate for all services
cdc generate --all --environment <env>
Validation
# Validate all configurations
cdc validate
db-per-tenant (One database per customer)
Use case: Each customer has a dedicated source database.
Example: AdOpus system with 26 customer databases.
Pipeline generation: Creates one source + sink pipeline per customer.
db-shared (Single database, multi-tenant)
Use case: All customers share one database, differentiated by customer_id.
Example: ASMA directory service with customer isolation via schema/column.
Pipeline generation: Creates one source + sink pipeline for all customers.
See: examples/db-shared/
๐๏ธ Architecture Patterns
db-per-tenant (One database per customer)
Use case: Each customer has a dedicated source database.
Example: SaaS application with isolated customer databases (customer_a_prod, customer_b_prod, etc.)
Pipeline generation: Creates one source + sink pipeline per customer database.
Setup:
cdc manage-server-group --create my-group \
--pattern db-per-tenant \
--source-type mssql \
--extraction-pattern '(?P<customer_id>\w+)_(?P<env>\w+)'
db-shared (Single database, multi-tenant)
Use case: All customers share one database, differentiated by customer_id column or schema.
Example: Multi-tenant application with customer isolation via tenant_id field
Pipeline generation: Creates one source + sink pipeline for all customers, with customer filtering.
Setup:
cdc manage-server-group --create my-group \
--pattern db-shared \
--source-type postgresql \
--extraction-pattern '(?P<customer_id>\w+)' \
--environment-aware
๐ณ Docker Container Workflow
cdc-pipeline-generator/
โโโ cdc_generator/ # Core library
โ โโโ core/ # Pipeline generation logic
โ โโโ helpers/ # Utility functions
โ โโโ validators/ # Configuration validation
โ โโโ cli/ # Command-line interface
โโโ examples/ # Reference implementations
โโโ db-per-tenant/ # Multi-database pattern
โโโ db-shared/ # Single-database pattern
๐ณ Docker Container Workflow
The recommended way to use this tool is inside the auto-generated dev container:
Why Use the Container?
โ
Isolated environment - No conflicts with host Python/packages
โ
All dependencies pre-installed - Python 3.11, Fish shell, database clients
โ
Database services included - MSSQL/PostgreSQL auto-configured
โ
Consistent across team - Same environment for everyone
Container Commands
# Start all services (databases + dev container)
docker compose up -d
# Enter dev container
docker compose exec dev fish
# Stop all services
docker compose down
# Rebuild container (after updating generator version)
docker compose up -d --build
# View logs
docker compose logs -f dev
docker compose logs -f mssql
docker compose logs -f postgres-target
Working Inside Container
Once inside (docker compose exec dev fish), you have:
- โ
cdccommand available - โ Access to source and target databases
- โ Fish shell with auto-completions
- โ Git configured (via volume mount)
- โ SSH keys available (via volume mount)
All your project files are mounted at /workspace, so changes are reflected immediately.
๐ Project Structure
๐ Project Structure
After running cdc init, your project will have:
my-cdc-project/
โโโ docker-compose.yml # Dev container + database services
โโโ Dockerfile.dev # Container image definition
โโโ .env.example # Environment variables template
โโโ .env # Your credentials (git-ignored)
โโโ .gitignore # Git ignore rules
โโโ server-groups.yaml # Server group config (generated by cdc)
โโโ README.md # Quick start guide
โโโ 2-services/ # Service definitions (generated by cdc)
โ โโโ my-service.yaml
โโโ 2-customers/ # Customer configs (for db-per-tenant)
โโโ 3-pipeline-templates/ # Custom pipeline templates (optional)
โโโ generated/ # Generated output (git-ignored)
โโโ pipelines/ # Redpanda Connect pipeline YAML
โโโ schemas/ # PostgreSQL schemas
โโโ table-definitions/ # Table metadata
๐ง Advanced Usage
Using as Python Library
from cdc_generator.core.pipeline_generator import generate_pipelines
# Generate pipelines programmatically
generate_pipelines(
service='my-service',
environment='dev',
output_dir='./generated/pipelines'
)
Custom Pipeline Templates
Place custom Jinja2 templates in 3-pipeline-templates/:
# 3-pipeline-templates/source-pipeline.yaml
input:
mssql_cdc:
dsn: "{{ dsn }}"
tables: {{ tables | tojson }}
# Your custom configuration
Environment-Specific Configuration
Use environment variables in server-groups.yaml:
server:
host: ${MSSQL_HOST} # Replaced at runtime
port: ${MSSQL_PORT}
user: ${MSSQL_USER}
password: ${MSSQL_PASSWORD}
๐ค Contributing
For Library Contributors
If you want to contribute to the cdc-pipeline-generator library itself:
# Clone repository
git clone https://github.com/Relaxe111/cdc-pipeline-generator.git
cd cdc-pipeline-generator
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black .
ruff check .
For Users
If you're using the library in your project, just install from PyPI as shown in Installation.
๐ Resources
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cdc_pipeline_generator-0.1.12.tar.gz.
File metadata
- Download URL: cdc_pipeline_generator-0.1.12.tar.gz
- Upload date:
- Size: 106.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
784ce5ffebf94fe4258d5198916713867871fbc52a36908efbcd71b1b4882215
|
|
| MD5 |
792d45f824021a34782372e377e908ca
|
|
| BLAKE2b-256 |
c5837b1b16d968cffc6168543ed9e78cbd1cca16e7d90e79c242e553f0cd83dc
|
File details
Details for the file cdc_pipeline_generator-0.1.12-py3-none-any.whl.
File metadata
- Download URL: cdc_pipeline_generator-0.1.12-py3-none-any.whl
- Upload date:
- Size: 128.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
584d6e8773195265af6010248b2eadf6d3102274cffd79176ee3416bf39122c0
|
|
| MD5 |
0488cbb0dc1d8745ea93c3689e54be2d
|
|
| BLAKE2b-256 |
65221eac200a6c55220e81d397cc5ea346b8b561cf17391911a57b4c0dc8b176
|