A production-ready Change Data Capture (CDC) pipeline that streams database changes from PostgreSQL to Kafka in real-time using Debezium. This project demonstrates modern data engineering practices including event streaming, database replication, and real-time data processing.
PostgreSQL (WAL) β Debezium Connect β Kafka β [Your Consumer Applications]
β
Schema Registry
This pipeline captures row-level changes (INSERT, UPDATE, DELETE) from PostgreSQL and publishes them as events to Kafka topics, enabling real-time data integration and event-driven architectures.
- Change Data Capture: Real-time streaming of database changes using Debezium
- Event Streaming: Apache Kafka for reliable, scalable message streaming
- Schema Management: Confluent Schema Registry for schema evolution
- Monitoring UI: Kafdrop for Kafka topic visualization and Debezium UI for connector management
- Logical Replication: PostgreSQL configured with Write-Ahead Logging (WAL)
- Docker Compose: Fully containerized setup for easy deployment
- Docker Engine 20.10+
- Docker Compose 2.0+
- 4GB+ RAM allocated to Docker
- Ports available: 5432, 8080, 8081, 8083, 9000, 9092
| Component | Technology | Purpose |
|---|---|---|
| Database | PostgreSQL 13 | Source database with logical replication |
| CDC | Debezium 2.1 | Change data capture connector |
| Streaming | Apache Kafka | Distributed event streaming platform |
| Coordination | Apache Zookeeper | Kafka cluster coordination |
| Schema Registry | Confluent Schema Registry | Schema versioning and validation |
| Monitoring | Kafdrop | Kafka topic browser and monitoring |
| UI | Debezium UI | Connector management interface |
git clone https://github.com/yourusername/cdc-kafka-pipeline.git
cd cdc-kafka-pipelinedocker-compose up -dWait for all services to be healthy (~30-60 seconds):
docker-compose psCreate a PostgreSQL CDC connector:
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" \
localhost:8083/connectors/ -d '{
"name": "postgres-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "docker",
"database.password": "docker",
"database.dbname": "exampledb",
"database.server.name": "dbserver1",
"table.include.list": "public.*",
"plugin.name": "pgoutput"
}
}'Create a test table and insert data:
docker exec -it postgres psql -U docker -d exampledb
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO customers (name, email) VALUES
('John Doe', 'john@example.com'),
('Jane Smith', 'jane@example.com');Open Kafdrop at http://localhost:9000 to see the CDC events in the dbserver1.public.customers topic.
| Service | URL | Purpose |
|---|---|---|
| Debezium UI | http://localhost:8080 | Manage CDC connectors |
| Kafdrop | http://localhost:9000 | Browse Kafka topics and messages |
| Schema Registry | http://localhost:8081 | View registered schemas |
| Debezium Connect API | http://localhost:8083 | Connector REST API |
The database is configured with:
wal_level=logical- Enables logical replication- Default user:
docker/ password:docker - Database:
exampledb
- Single broker setup (suitable for development)
- Replication factor: 1
- Advertised listener:
kafka:9092
All services communicate through the cdc-network bridge network.
.
βββ docker-compose.yml # Infrastructure definition
βββ README.md # This file
βββ connectors/ # Debezium connector configs (optional)
βββ consumers/ # Example consumer applications (optional)
curl http://localhost:8083/connectors/postgres-connector/statusdocker exec -it kafka kafka-topics --list --bootstrap-server localhost:9092docker exec -it kafka kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic dbserver1.public.customers \
--from-beginningdocker-compose downdocker-compose down -v- Real-time data replication
- Event-driven microservices
- Data lake ingestion
- Cache invalidation
- Audit logging
- Real-time analytics
- Data synchronization across systems
Check PostgreSQL is configured for logical replication:
docker exec postgres psql -U docker -d exampledb -c "SHOW wal_level;"Verify Kafka is running and accessible:
docker logs kafka- Check connector status in Debezium UI
- Verify table is in the whitelist
- Ensure PostgreSQL user has replication permissions
To extend this project:
- Add Consumer Applications: Build services that consume CDC events
- Data Transformations: Use Kafka Streams for real-time processing
- Multiple Databases: Add MySQL, MongoDB, or other source connectors
- Sink Connectors: Stream data to Elasticsearch, S3, or data warehouses
- Monitoring: Add Prometheus and Grafana for metrics
- Production Hardening: Configure authentication, SSL, and multi-broker setup
This project is open source and available under the MIT License.
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
Built with β€οΈ as a Data Engineering Portfolio Project