CDC pipeline configuration generator for Redpanda Connect

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

CDC Pipeline Generator

Generate Redpanda Connect pipeline configurations for Change Data Capture (CDC) workflows.

A CLI-first tool for managing CDC pipelines with automatic Docker dev container setup, supporting both db-per-tenant (one database per customer) and db-shared (single database, multi-tenant) patterns.

✨ Features

🚀 Zero-config setup: pip install → cdc init → ready to develop
🐳 Docker dev container: Automatic environment setup with all dependencies
🔄 Multi-tenant patterns: Support for db-per-tenant and db-shared architectures
📝 Template-based generation: Jinja2 templates for flexible pipeline configuration
✅ CLI-first philosophy: All operations via cdc commands, no manual YAML editing
🛠️ Database integration: Auto-updates docker-compose.yml with database services

📦 Installation

pip install cdc-pipeline-generator

That's it! The cdc command is now available globally.

🚀 Quick Start (Recommended Workflow)

⚠️ CLI-First Philosophy: All configuration is managed through cdc commands. Never edit YAML files manually. The CLI is the sole interface for configuration management.

1. Initialize New Project

# Create project directory
mkdir my-cdc-project
cd my-cdc-project

# Initialize with dev container
cdc init
# ✅ Creates docker-compose.yml, Dockerfile.dev, project structure
# ✅ Builds dev container with Python, Fish shell, all dependencies
# ✅ Prompts to start container automatically

2. Enter Dev Container

docker compose exec dev fish
# Now inside container with cdc commands ready to use

3. Create Server Group (Auto-configures Docker Compose)

# For MSSQL source (db-per-tenant pattern)
cdc manage-server-group --create my-group \
  --pattern db-per-tenant \
  --source-type mssql \
  --extraction-pattern '(?P<customer_id>\w+)_(?P<env>\w+)' \
  --host '${MSSQL_HOST}' \
  --port 1433 \
  --user '${MSSQL_USER}' \
  --password '${MSSQL_PASSWORD}'

# ✅ Creates server-groups.yaml
# ✅ Auto-updates docker-compose.yml with MSSQL + PostgreSQL services
# ✅ Adds volume definitions and service dependencies

Or for PostgreSQL source (db-shared pattern):

cdc manage-server-group --create my-group \
  --pattern db-shared \
  --source-type postgresql \
  --extraction-pattern '(?P<customer_id>\w+)' \
  --environment-aware \
  --host '${POSTGRES_SOURCE_HOST}' \
  --port 5432 \
  --user '${POSTGRES_SOURCE_USER}' \
  --password '${POSTGRES_SOURCE_PASSWORD}'

4. Configure Environment Variables

# Copy example and edit with your credentials
cp .env.example .env
nano .env  # or use your preferred editor

Example .env:

# Source Database (MSSQL)
MSSQL_HOST=mssql
MSSQL_PORT=1433
MSSQL_USER=sa
MSSQL_PASSWORD=YourPassword123!

# Target Database (PostgreSQL)
POSTGRES_TARGET_HOST=postgres-target
POSTGRES_TARGET_PORT=5432
POSTGRES_TARGET_USER=postgres
POSTGRES_TARGET_PASSWORD=postgres
POSTGRES_TARGET_DB=cdc_target

5. Start All Services

# Exit container temporarily
exit

# Start databases and dev container
docker compose up -d

# Re-enter dev container
docker compose exec dev fish

6. Create Service and Add Tables

# Create service
cdc manage-service --create my-service --server-group my-group

# Add tables to track
cdc manage-service --service my-service --add-table Users --primary-key id
cdc manage-service --service my-service --add-table Orders --primary-key order_id

# Inspect available tables (optional)
cdc manage-service --service my-service --inspect --schema dbo

7. Update Server Group (Populate Databases)

# Inspect source database and populate server-groups.yaml
cdc manage-server-group --update
# ✅ Auto-discovers databases
# ✅ Maps databases to environments (dev/stage/prod)
# ✅ Populates table counts and statistics

8. Generate CDC Pipelines

# Generate pipelines for development environment
cdc generate --service my-service --environment dev

# Check generated files
ls generated/pipelines/
ls generated/schemas/

9. Deploy Pipelines

Generated pipeline files in generated/pipelines/ are ready to deploy to your Redpanda Connect infrastructure.

📋 Complete Command Reference

Project Initialization

cdc init                      # Initialize new CDC project with dev container

Service Management

# Create service
cdc manage-service --create <name> --server-group <group-name>

# Add tables
cdc manage-service --service <name> --add-table <TableName> --primary-key <column>

# Remove tables
cdc manage-service --service <name> --remove-table <TableName>

# Inspect database schema
cdc manage-service --service <name> --inspect --schema <schema-name>

Server Group Management

# Create server group (auto-updates docker-compose.yml)
cdc manage-server-group --create <name> \
  --pattern <db-per-tenant|db-shared> \
  --source-type <mssql|postgresql> \
  --extraction-pattern '<regex>' \
  [--environment-aware]  # Required for db-shared

# Update from database inspection
cdc manage-server-group --update

# Show server group info
cdc manage-server-group --info

# List all server groups
cdc manage-server-group --list

Pipeline Generation

# Generate for specific service
cdc generate --service <name> --environment <dev|stage|prod>

# Generate for all services
cdc generate --all --environment <env>

Validation

# Validate all configurations
cdc validate

db-per-tenant (One database per customer)

Use case: Each customer has a dedicated source database.

Example: AdOpus system with 26 customer databases.

Pipeline generation: Creates one source + sink pipeline per customer.

See: examples/db-per-tenant/

db-shared (Single database, multi-tenant)

Use case: All customers share one database, differentiated by customer_id.

Example: ASMA directory service with customer isolation via schema/column.

Pipeline generation: Creates one source + sink pipeline for all customers.

See: examples/db-shared/

🏗️ Architecture Patterns

db-per-tenant (One database per customer)

Use case: Each customer has a dedicated source database.

Example: SaaS application with isolated customer databases (customer_a_prod, customer_b_prod, etc.)

Pipeline generation: Creates one source + sink pipeline per customer database.

Setup:

cdc manage-server-group --create my-group \
  --pattern db-per-tenant \
  --source-type mssql \
  --extraction-pattern '(?P<customer_id>\w+)_(?P<env>\w+)'

db-shared (Single database, multi-tenant)

Use case: All customers share one database, differentiated by customer_id column or schema.

Example: Multi-tenant application with customer isolation via tenant_id field

Pipeline generation: Creates one source + sink pipeline for all customers, with customer filtering.

Setup:

cdc manage-server-group --create my-group \
  --pattern db-shared \
  --source-type postgresql \
  --extraction-pattern '(?P<customer_id>\w+)' \
  --environment-aware

🐳 Docker Container Workflow

cdc-pipeline-generator/
├── cdc_generator/           # Core library
│   ├── core/               # Pipeline generation logic
│   ├── helpers/            # Utility functions
│   ├── validators/         # Configuration validation
│   └── cli/                # Command-line interface
└── examples/               # Reference implementations
    ├── db-per-tenant/     # Multi-database pattern
    └── db-shared/         # Single-database pattern

🐳 Docker Container Workflow

The recommended way to use this tool is inside the auto-generated dev container:

Why Use the Container?

✅ Isolated environment - No conflicts with host Python/packages
✅ All dependencies pre-installed - Python 3.11, Fish shell, database clients
✅ Database services included - MSSQL/PostgreSQL auto-configured
✅ Consistent across team - Same environment for everyone

Container Commands

# Start all services (databases + dev container)
docker compose up -d

# Enter dev container
docker compose exec dev fish

# Stop all services
docker compose down

# Rebuild container (after updating generator version)
docker compose up -d --build

# View logs
docker compose logs -f dev
docker compose logs -f mssql
docker compose logs -f postgres-target

Working Inside Container

Once inside (docker compose exec dev fish), you have:

✅ cdc command available
✅ Access to source and target databases
✅ Fish shell with auto-completions
✅ Git configured (via volume mount)
✅ SSH keys available (via volume mount)

All your project files are mounted at /workspace, so changes are reflected immediately.

📁 Project Structure

After running cdc init, your project will have:

my-cdc-project/
├── docker-compose.yml           # Dev container + database services
├── Dockerfile.dev               # Container image definition
├── .env.example                 # Environment variables template
├── .env                         # Your credentials (git-ignored)
├── .gitignore                   # Git ignore rules
├── server-groups.yaml           # Server group config (generated by cdc)
├── README.md                    # Quick start guide
├── 2-services/                  # Service definitions (generated by cdc)
│   └── my-service.yaml
├── 2-customers/                 # Customer configs (for db-per-tenant)
├── 3-pipeline-templates/        # Custom pipeline templates (optional)
└── generated/                   # Generated output (git-ignored)
    ├── pipelines/               # Redpanda Connect pipeline YAML
    ├── schemas/                 # PostgreSQL schemas
    └── table-definitions/       # Table metadata

🔧 Advanced Usage

Using as Python Library

from cdc_generator.core.pipeline_generator import generate_pipelines

# Generate pipelines programmatically
generate_pipelines(
    service='my-service',
    environment='dev',
    output_dir='./generated/pipelines'
)

Custom Pipeline Templates

Place custom Jinja2 templates in 3-pipeline-templates/:

# 3-pipeline-templates/source-pipeline.yaml
input:
  mssql_cdc:
    dsn: "{{ dsn }}"
    tables: {{ tables | tojson }}
    # Your custom configuration

Environment-Specific Configuration

Use environment variables in server-groups.yaml:

server:
  host: ${MSSQL_HOST}        # Replaced at runtime
  port: ${MSSQL_PORT}
  user: ${MSSQL_USER}
  password: ${MSSQL_PASSWORD}

🤝 Contributing

For Library Contributors

If you want to contribute to the cdc-pipeline-generator library itself:

# Clone repository
git clone https://github.com/Relaxe111/cdc-pipeline-generator.git
cd cdc-pipeline-generator

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
ruff check .

For Users

If you're using the library in your project, just install from PyPI as shown in Installation.

📚 Resources

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

0.1.12

Feb 1, 2026

0.1.11

Feb 1, 2026

0.1.10

Feb 1, 2026

0.1.9

Feb 1, 2026

0.1.8

Feb 1, 2026

0.1.7

Feb 1, 2026

0.1.6

Feb 1, 2026

0.1.5

Feb 1, 2026

0.1.4

Feb 1, 2026

0.1.3

Feb 1, 2026

0.1.2

Feb 1, 2026

0.1.1

Feb 1, 2026

0.1.0

Feb 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdc_pipeline_generator-0.1.12.tar.gz (106.0 kB view details)

Uploaded Feb 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cdc_pipeline_generator-0.1.12-py3-none-any.whl (128.7 kB view details)

Uploaded Feb 1, 2026 Python 3

File details

Details for the file cdc_pipeline_generator-0.1.12.tar.gz.

File metadata

Download URL: cdc_pipeline_generator-0.1.12.tar.gz
Upload date: Feb 1, 2026
Size: 106.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for cdc_pipeline_generator-0.1.12.tar.gz
Algorithm	Hash digest
SHA256	`784ce5ffebf94fe4258d5198916713867871fbc52a36908efbcd71b1b4882215`
MD5	`792d45f824021a34782372e377e908ca`
BLAKE2b-256	`c5837b1b16d968cffc6168543ed9e78cbd1cca16e7d90e79c242e553f0cd83dc`

See more details on using hashes here.

File details

Details for the file cdc_pipeline_generator-0.1.12-py3-none-any.whl.

File metadata

Download URL: cdc_pipeline_generator-0.1.12-py3-none-any.whl
Upload date: Feb 1, 2026
Size: 128.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for cdc_pipeline_generator-0.1.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`584d6e8773195265af6010248b2eadf6d3102274cffd79176ee3416bf39122c0`
MD5	`0488cbb0dc1d8745ea93c3689e54be2d`
BLAKE2b-256	`65221eac200a6c55220e81d397cc5ea346b8b561cf17391911a57b4c0dc8b176`

See more details on using hashes here.

cdc-pipeline-generator 0.1.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CDC Pipeline Generator

✨ Features

📦 Installation

🚀 Quick Start (Recommended Workflow)

1. Initialize New Project

2. Enter Dev Container

3. Create Server Group (Auto-configures Docker Compose)

4. Configure Environment Variables

5. Start All Services

6. Create Service and Add Tables

7. Update Server Group (Populate Databases)

8. Generate CDC Pipelines

9. Deploy Pipelines

📋 Complete Command Reference

📋 Complete Command Reference

Project Initialization

Service Management

Server Group Management

Pipeline Generation

Validation

db-per-tenant (One database per customer)

db-shared (Single database, multi-tenant)

🏗️ Architecture Patterns

db-per-tenant (One database per customer)

db-shared (Single database, multi-tenant)

🐳 Docker Container Workflow

🐳 Docker Container Workflow

Why Use the Container?

Container Commands

Working Inside Container

📁 Project Structure

📁 Project Structure

🔧 Advanced Usage

Using as Python Library

Custom Pipeline Templates

Environment-Specific Configuration

🤝 Contributing

For Library Contributors

For Users

📚 Resources

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes