Real-time Analytics & Metrics

Когда Warehouse не хватает

Warehouse (BigQuery, Snowflake) оптимизирован для batch-аналитики: запросы за секунды-минуты на исторических данных. Но real-time dashboards, metrics API, и sub-second queries требуют другого подхода.

Warehouse query:
  SELECT region, SUM(revenue) FROM orders
  WHERE date BETWEEN '2024-01-01' AND '2024-12-31'
  GROUP BY region
  → 3-15 seconds [OK] для BI отчёта
  → [NO] для API endpoint (SLA: 200ms)
  → [NO] для live dashboard (обновление каждые 5 сек)

OLAP Engines: ClickHouse vs Druid vs Pinot

OLAP Engine Comparison: ClickHouse vs Druid vs Pinot

Metric

ClickHouse

Apache Druid

Apache Pinot

Query Latency

10-100ms

50-500ms

10-100ms

SQL Support

Full SQL + JOINs

SQL-like, limited JOINs

SQL (multi-stage)

Operations

Simple (single binary)

Complex (many services)

Complex (multiple roles)

Feature	ClickHouse	Apache Druid	Apache Pinot
Creator	Yandex	Metamarkets	LinkedIn
Query latency	10-100ms	50-500ms	10-100ms
Ingestion	Batch + Kafka	Batch + Kafka	Batch + Kafka
Storage	Column-oriented, MergeTree	Segment-based	Segment-based
SQL support	Full SQL	SQL-like (Druid SQL)	SQL (multi-stage)
Joins	Supported	Limited	Limited
Scaling	Sharding + replication	Horizontal	Horizontal
Operations	Simpler (single binary)	Complex (many services)	Complex
Best for	General OLAP, logs	Time-series, events	User-facing metrics

Decision guide:
  Need JOINs + complex SQL → ClickHouse
  Time-series events, high concurrency → Druid
  User-facing analytics (SaaS metrics) → Pinot
  Already have Warehouse → Materialized Views first

Time-series modeling в ClickHouse DataFusion для real-time на lakehouse

Materialized Views

Materialized View = pre-computed query result, stored as table

-- ClickHouse Materialized View
CREATE MATERIALIZED VIEW revenue_by_region
ENGINE = SummingMergeTree()
ORDER BY (region, date)
AS SELECT
  region,
  toDate(created_at) AS date,
  sum(amount) AS daily_revenue,
  count() AS order_count
FROM orders
GROUP BY region, date

-- Query: 2ms instead of 2 seconds
SELECT region, sum(daily_revenue)
FROM revenue_by_region
WHERE date >= today() - 30
GROUP BY region

Pre-aggregation Patterns

Level 0: Raw events (TB scale)
  orders: order_id, user_id, amount, created_at
  → Full flexibility, slow queries

Level 1: Minute aggregates
  orders_1m: minute, region, count, sum_amount
  → 1440 rows/day per region

Level 2: Hourly aggregates
  orders_1h: hour, region, count, sum_amount
  → 24 rows/day per region

Level 3: Daily aggregates
  orders_1d: date, region, count, sum_amount
  → 1 row/day per region

Trade-off: each level loses granularity but gains speed
Query routing: last 1h → Level 1, last 7d → Level 2, last 1y → Level 3

Metrics Platform Design

Architecture:

  Sources → Kafka → OLAP Engine → Metrics API → Dashboards
                                              → Alerting
                                              → Embedding

Components:
  1. Semantic Layer (Cube.js / MetricFlow):
     Define metrics ONCE, query everywhere
     metric: revenue = SUM(amount) WHERE status = 'completed'
     
  2. Caching Layer:
     Redis cache for hot queries (TTL = 60s)
     Pre-warming: compute top-50 dashboards every minute
     
  3. Query Federation:
     Last 1 hour → OLAP (ClickHouse)
     Last 1 year → Warehouse (BigQuery)
     Transparent routing for users

TIP

Warehouse vs OLAP — не вместо, а вместе. Warehouse для batch analytics и ad-hoc queries (аналитики). OLAP для real-time dashboards и metrics API (продукт, операции). Данные из одного lakehouse, served в оба.

WARNING

Anti-pattern: OLAP для всего. ClickHouse/Druid не заменяют warehouse. Они не подходят для: ad-hoc JOINs по 20 таблицам, ML feature computation, complex transformations. Используйте OLAP для pre-defined metrics с sub-second SLA.