Reverse ETL и Event-Driven Patterns
Reverse ETL: Data Activation
Классический ETL: operational systems → DWH. Reverse ETL: DWH → operational systems.
→ ETL (forward)
← Reverse ETL
Use Cases
Reverse ETL Examples:
1. Customer Segments → Salesforce
DWH: SELECT users WHERE ltv > 1000 AND last_purchase < 30d
→ Sync to Salesforce as "High Value Active" segment
→ Sales team targets them
2. Lead Scoring → HubSpot
DWH: ML model computes lead_score per user
→ Sync scores to HubSpot contact properties
→ Marketing prioritizes high-score leads
3. Product Recommendations → Email Platform
DWH: recommendation model outputs per-user product list
→ Sync to SendGrid/Braze as personalization data
→ Personalized email campaigns
Tools
| Tool | Type | Approach |
|---|---|---|
| Census | Reverse ETL | DWH → 150+ destinations |
| Hightouch | Reverse ETL | DWH → operational tools |
| dbt + Census | Combined | Transform + activate |
| Airbyte | Bidirectional | Extract + reverse ETL |
Event-Driven Architecture
Паттерн: Event as First-Class Citizen
Traditional: request-response
Service A → HTTP POST → Service B → response
Event-driven: publish-subscribe
Service A → publish event → Event Bus (Kafka) →
Service B subscribes
Service C subscribes
Data Pipeline subscribes (!)
Event-Driven Data Pipeline
Service A
Service B
Service C
Kafka (Event Bus)
Flink (real-time)
Spark (batch)
Lakehouse
Key Patterns
Outbox Pattern: гарантированная доставка event при изменении DB
Outbox Pattern:
1. Service writes to DB: INSERT INTO orders (...)
2. В ТОЙ ЖЕ транзакции: INSERT INTO outbox (event_type, payload)
3. CDC (Debezium) читает outbox table → публикует в Kafka
4. Гарантия: если DB write succeeded, event БУДЕТ опубликован
Без Outbox:
Service writes to DB → then publishes to Kafka
Проблема: DB write succeeded, Kafka publish failed → inconsistency
Transactional Outbox — реализация
Outbox Pattern в Debezium — теория
Event Sourcing на Kafka — глубокий разбор
Event Sourcing: хранить events, не state
Event Sourcing:
Не: users.balance = 100 (current state)
А: event_log: [deposit(50), deposit(100), withdraw(50)]
Current state = replay all events
Benefit: full audit trail, time travel, replay