Skip to main content

CDC pipeline configuration generator for Redpanda Connect

Project description

CDC Pipeline Generator

Generate Redpanda Connect pipeline configurations for Change Data Capture (CDC) workflows.

A CLI-first tool for managing CDC pipelines with automatic Docker dev container setup, supporting both db-per-tenant (one database per customer) and db-shared (single database, multi-tenant) patterns.

โœจ Features

  • ๐Ÿš€ Zero-config setup: pip install โ†’ cdc init โ†’ ready to develop
  • ๐Ÿณ Docker dev container: Automatic environment setup with all dependencies
  • ๐Ÿ”„ Multi-tenant patterns: Support for db-per-tenant and db-shared architectures
  • ๐Ÿ“ Template-based generation: Jinja2 templates for flexible pipeline configuration
  • โœ… CLI-first philosophy: All operations via cdc commands, no manual YAML editing
  • ๐Ÿ› ๏ธ Database integration: Auto-updates docker-compose.yml with database services

๐Ÿ“ฆ Installation

pip install cdc-pipeline-generator

That's it! The cdc command is now available globally.

๐Ÿš€ Quick Start (Recommended Workflow)

โš ๏ธ CLI-First Philosophy: All configuration is managed through cdc commands. Never edit YAML files manually. The CLI is the sole interface for configuration management.

1. Initialize New Project

# Create project directory
mkdir my-cdc-project
cd my-cdc-project

# Initialize with dev container
cdc init
# โœ… Creates docker-compose.yml, Dockerfile.dev, project structure
# โœ… Builds dev container with Python, Fish shell, all dependencies
# โœ… Prompts to start container automatically

2. Enter Dev Container

docker compose exec dev fish
# Now inside container with cdc commands ready to use

3. Create Server Group (Auto-configures Docker Compose)

# For MSSQL source (db-per-tenant pattern)
cdc manage-server-group --create my-group \
  --pattern db-per-tenant \
  --source-type mssql \
  --extraction-pattern '(?P<customer_id>\w+)_(?P<env>\w+)' \
  --host '${MSSQL_HOST}' \
  --port 1433 \
  --user '${MSSQL_USER}' \
  --password '${MSSQL_PASSWORD}'

# โœ… Creates server-groups.yaml
# โœ… Auto-updates docker-compose.yml with MSSQL + PostgreSQL services
# โœ… Adds volume definitions and service dependencies

Or for PostgreSQL source (db-shared pattern):

cdc manage-server-group --create my-group \
  --pattern db-shared \
  --source-type postgresql \
  --extraction-pattern '(?P<customer_id>\w+)' \
  --environment-aware \
  --host '${POSTGRES_SOURCE_HOST}' \
  --port 5432 \
  --user '${POSTGRES_SOURCE_USER}' \
  --password '${POSTGRES_SOURCE_PASSWORD}'

4. Configure Environment Variables

# Copy example and edit with your credentials
cp .env.example .env
nano .env  # or use your preferred editor

Example .env:

# Source Database (MSSQL)
MSSQL_HOST=mssql
MSSQL_PORT=1433
MSSQL_USER=sa
MSSQL_PASSWORD=YourPassword123!

# Target Database (PostgreSQL)
POSTGRES_TARGET_HOST=postgres-target
POSTGRES_TARGET_PORT=5432
POSTGRES_TARGET_USER=postgres
POSTGRES_TARGET_PASSWORD=postgres
POSTGRES_TARGET_DB=cdc_target

5. Start All Services

# Exit container temporarily
exit

# Start databases and dev container
docker compose up -d

# Re-enter dev container
docker compose exec dev fish

6. Create Service and Add Tables

# Create service
cdc manage-service --create my-service --server-group my-group

# Add tables to track
cdc manage-service --service my-service --add-table Users --primary-key id
cdc manage-service --service my-service --add-table Orders --primary-key order_id

# Inspect available tables (optional)
cdc manage-service --service my-service --inspect --schema dbo

7. Update Server Group (Populate Databases)

# Inspect source database and populate server-groups.yaml
cdc manage-server-group --update
# โœ… Auto-discovers databases
# โœ… Maps databases to environments (dev/stage/prod)
# โœ… Populates table counts and statistics

8. Generate CDC Pipelines

# Generate pipelines for development environment
cdc generate --service my-service --environment dev

# Check generated files
ls generated/pipelines/
ls generated/schemas/

9. Deploy Pipelines

Generated pipeline files in generated/pipelines/ are ready to deploy to your Redpanda Connect infrastructure.


๐Ÿ“‹ Complete Command Reference


๐Ÿ“‹ Complete Command Reference

Project Initialization

cdc init                      # Initialize new CDC project with dev container

Service Management

# Create service
cdc manage-service --create <name> --server-group <group-name>

# Add tables
cdc manage-service --service <name> --add-table <TableName> --primary-key <column>

# Remove tables
cdc manage-service --service <name> --remove-table <TableName>

# Inspect database schema
cdc manage-service --service <name> --inspect --schema <schema-name>

Server Group Management

# Create server group (auto-updates docker-compose.yml)
cdc manage-server-group --create <name> \
  --pattern <db-per-tenant|db-shared> \
  --source-type <mssql|postgresql> \
  --extraction-pattern '<regex>' \
  [--environment-aware]  # Required for db-shared

# Update from database inspection
cdc manage-server-group --update

# Show server group info
cdc manage-server-group --info

# List all server groups
cdc manage-server-group --list

Pipeline Generation

# Generate for specific service
cdc generate --service <name> --environment <dev|stage|prod>

# Generate for all services
cdc generate --all --environment <env>

Validation

# Validate all configurations
cdc validate

db-per-tenant (One database per customer)

Use case: Each customer has a dedicated source database.

Example: AdOpus system with 26 customer databases.

Pipeline generation: Creates one source + sink pipeline per customer.

See: examples/db-per-tenant/

db-shared (Single database, multi-tenant)

Use case: All customers share one database, differentiated by customer_id.

Example: ASMA directory service with customer isolation via schema/column.

Pipeline generation: Creates one source + sink pipeline for all customers.

See: examples/db-shared/


๐Ÿ—๏ธ Architecture Patterns

db-per-tenant (One database per customer)

Use case: Each customer has a dedicated source database.

Example: SaaS application with isolated customer databases (customer_a_prod, customer_b_prod, etc.)

Pipeline generation: Creates one source + sink pipeline per customer database.

Setup:

cdc manage-server-group --create my-group \
  --pattern db-per-tenant \
  --source-type mssql \
  --extraction-pattern '(?P<customer_id>\w+)_(?P<env>\w+)'

db-shared (Single database, multi-tenant)

Use case: All customers share one database, differentiated by customer_id column or schema.

Example: Multi-tenant application with customer isolation via tenant_id field

Pipeline generation: Creates one source + sink pipeline for all customers, with customer filtering.

Setup:

cdc manage-server-group --create my-group \
  --pattern db-shared \
  --source-type postgresql \
  --extraction-pattern '(?P<customer_id>\w+)' \
  --environment-aware

๐Ÿณ Docker Container Workflow

cdc-pipeline-generator/
โ”œโ”€โ”€ cdc_generator/           # Core library
โ”‚   โ”œโ”€โ”€ core/               # Pipeline generation logic
โ”‚   โ”œโ”€โ”€ helpers/            # Utility functions
โ”‚   โ”œโ”€โ”€ validators/         # Configuration validation
โ”‚   โ””โ”€โ”€ cli/                # Command-line interface
โ””โ”€โ”€ examples/               # Reference implementations
    โ”œโ”€โ”€ db-per-tenant/     # Multi-database pattern
    โ””โ”€โ”€ db-shared/         # Single-database pattern

๐Ÿณ Docker Container Workflow

The recommended way to use this tool is inside the auto-generated dev container:

Why Use the Container?

โœ… Isolated environment - No conflicts with host Python/packages
โœ… All dependencies pre-installed - Python 3.11, Fish shell, database clients
โœ… Database services included - MSSQL/PostgreSQL auto-configured
โœ… Consistent across team - Same environment for everyone

Container Commands

# Start all services (databases + dev container)
docker compose up -d

# Enter dev container
docker compose exec dev fish

# Stop all services
docker compose down

# Rebuild container (after updating generator version)
docker compose up -d --build

# View logs
docker compose logs -f dev
docker compose logs -f mssql
docker compose logs -f postgres-target

Working Inside Container

Once inside (docker compose exec dev fish), you have:

  • โœ… cdc command available
  • โœ… Access to source and target databases
  • โœ… Fish shell with auto-completions
  • โœ… Git configured (via volume mount)
  • โœ… SSH keys available (via volume mount)

All your project files are mounted at /workspace, so changes are reflected immediately.


๐Ÿ“ Project Structure


๐Ÿ“ Project Structure

After running cdc init, your project will have:

my-cdc-project/
โ”œโ”€โ”€ docker-compose.yml           # Dev container + database services
โ”œโ”€โ”€ Dockerfile.dev               # Container image definition
โ”œโ”€โ”€ .env.example                 # Environment variables template
โ”œโ”€โ”€ .env                         # Your credentials (git-ignored)
โ”œโ”€โ”€ .gitignore                   # Git ignore rules
โ”œโ”€โ”€ server-groups.yaml           # Server group config (generated by cdc)
โ”œโ”€โ”€ README.md                    # Quick start guide
โ”œโ”€โ”€ 2-services/                  # Service definitions (generated by cdc)
โ”‚   โ””โ”€โ”€ my-service.yaml
โ”œโ”€โ”€ 2-customers/                 # Customer configs (for db-per-tenant)
โ”œโ”€โ”€ 3-pipeline-templates/        # Custom pipeline templates (optional)
โ””โ”€โ”€ generated/                   # Generated output (git-ignored)
    โ”œโ”€โ”€ pipelines/               # Redpanda Connect pipeline YAML
    โ”œโ”€โ”€ schemas/                 # PostgreSQL schemas
    โ””โ”€โ”€ table-definitions/       # Table metadata

๐Ÿ”ง Advanced Usage

Using as Python Library

from cdc_generator.core.pipeline_generator import generate_pipelines

# Generate pipelines programmatically
generate_pipelines(
    service='my-service',
    environment='dev',
    output_dir='./generated/pipelines'
)

Custom Pipeline Templates

Place custom Jinja2 templates in 3-pipeline-templates/:

# 3-pipeline-templates/source-pipeline.yaml
input:
  mssql_cdc:
    dsn: "{{ dsn }}"
    tables: {{ tables | tojson }}
    # Your custom configuration

Environment-Specific Configuration

Use environment variables in server-groups.yaml:

server:
  host: ${MSSQL_HOST}        # Replaced at runtime
  port: ${MSSQL_PORT}
  user: ${MSSQL_USER}
  password: ${MSSQL_PASSWORD}

๐Ÿค Contributing

For Library Contributors

If you want to contribute to the cdc-pipeline-generator library itself:

# Clone repository
git clone https://github.com/Relaxe111/cdc-pipeline-generator.git
cd cdc-pipeline-generator

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
ruff check .

For Users

If you're using the library in your project, just install from PyPI as shown in Installation.


๐Ÿ“š Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdc_pipeline_generator-0.1.12.tar.gz (106.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cdc_pipeline_generator-0.1.12-py3-none-any.whl (128.7 kB view details)

Uploaded Python 3

File details

Details for the file cdc_pipeline_generator-0.1.12.tar.gz.

File metadata

  • Download URL: cdc_pipeline_generator-0.1.12.tar.gz
  • Upload date:
  • Size: 106.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for cdc_pipeline_generator-0.1.12.tar.gz
Algorithm Hash digest
SHA256 784ce5ffebf94fe4258d5198916713867871fbc52a36908efbcd71b1b4882215
MD5 792d45f824021a34782372e377e908ca
BLAKE2b-256 c5837b1b16d968cffc6168543ed9e78cbd1cca16e7d90e79c242e553f0cd83dc

See more details on using hashes here.

File details

Details for the file cdc_pipeline_generator-0.1.12-py3-none-any.whl.

File metadata

File hashes

Hashes for cdc_pipeline_generator-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 584d6e8773195265af6010248b2eadf6d3102274cffd79176ee3416bf39122c0
MD5 0488cbb0dc1d8745ea93c3689e54be2d
BLAKE2b-256 65221eac200a6c55220e81d397cc5ea346b8b561cf17391911a57b4c0dc8b176

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page