Real-time Analytics & Metrics
Когда Warehouse не хватает
Warehouse (BigQuery, Snowflake) оптимизирован для batch-аналитики: запросы за секунды-минуты на исторических данных. Но real-time dashboards, metrics API, и sub-second queries требуют другого подхода.
Warehouse query:
SELECT region, SUM(revenue) FROM orders
WHERE date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY region
→ 3-15 seconds [OK] для BI отчёта
→ [NO] для API endpoint (SLA: 200ms)
→ [NO] для live dashboard (обновление каждые 5 сек)
OLAP Engines: ClickHouse vs Druid vs Pinot
| Feature | ClickHouse | Apache Druid | Apache Pinot |
|---|---|---|---|
| Creator | Yandex | Metamarkets | |
| Query latency | 10-100ms | 50-500ms | 10-100ms |
| Ingestion | Batch + Kafka | Batch + Kafka | Batch + Kafka |
| Storage | Column-oriented, MergeTree | Segment-based | Segment-based |
| SQL support | Full SQL | SQL-like (Druid SQL) | SQL (multi-stage) |
| Joins | Supported | Limited | Limited |
| Scaling | Sharding + replication | Horizontal | Horizontal |
| Operations | Simpler (single binary) | Complex (many services) | Complex |
| Best for | General OLAP, logs | Time-series, events | User-facing metrics |
Decision guide:
Need JOINs + complex SQL → ClickHouse
Time-series events, high concurrency → Druid
User-facing analytics (SaaS metrics) → Pinot
Already have Warehouse → Materialized Views first
Time-series modeling в ClickHouse
DataFusion для real-time на lakehouse
Materialized Views
Materialized View = pre-computed query result, stored as table
-- ClickHouse Materialized View
CREATE MATERIALIZED VIEW revenue_by_region
ENGINE = SummingMergeTree()
ORDER BY (region, date)
AS SELECT
region,
toDate(created_at) AS date,
sum(amount) AS daily_revenue,
count() AS order_count
FROM orders
GROUP BY region, date
-- Query: 2ms instead of 2 seconds
SELECT region, sum(daily_revenue)
FROM revenue_by_region
WHERE date >= today() - 30
GROUP BY region
Pre-aggregation Patterns
Level 0: Raw events (TB scale)
orders: order_id, user_id, amount, created_at
→ Full flexibility, slow queries
Level 1: Minute aggregates
orders_1m: minute, region, count, sum_amount
→ 1440 rows/day per region
Level 2: Hourly aggregates
orders_1h: hour, region, count, sum_amount
→ 24 rows/day per region
Level 3: Daily aggregates
orders_1d: date, region, count, sum_amount
→ 1 row/day per region
Trade-off: each level loses granularity but gains speed
Query routing: last 1h → Level 1, last 7d → Level 2, last 1y → Level 3
Metrics Platform Design
Architecture:
Sources → Kafka → OLAP Engine → Metrics API → Dashboards
→ Alerting
→ Embedding
Components:
1. Semantic Layer (Cube.js / MetricFlow):
Define metrics ONCE, query everywhere
metric: revenue = SUM(amount) WHERE status = 'completed'
2. Caching Layer:
Redis cache for hot queries (TTL = 60s)
Pre-warming: compute top-50 dashboards every minute
3. Query Federation:
Last 1 hour → OLAP (ClickHouse)
Last 1 year → Warehouse (BigQuery)
Transparent routing for users
Warehouse vs OLAP — не вместо, а вместе. Warehouse для batch analytics и ad-hoc queries (аналитики). OLAP для real-time dashboards и metrics API (продукт, операции). Данные из одного lakehouse, served в оба.
Anti-pattern: OLAP для всего. ClickHouse/Druid не заменяют warehouse. Они не подходят для: ad-hoc JOINs по 20 таблицам, ML feature computation, complex transformations. Используйте OLAP для pre-defined metrics с sub-second SLA.