Kafka Architecture для System Design
Kafka как Central Nervous System
Kafka — не просто message broker. Это distributed commit log — append-only журнал, который хранит события и позволяет множеству потребителей читать их независимо.
Distributed commit log — фундаментальная модель Kafka KRaft controller — архитектура без ZooKeeperPartitions: единица параллелизма
Ordering Guarantees
Kafka ordering:
[OK] WITHIN partition: messages ordered (FIFO)
[NO] ACROSS partitions: no ordering guarantee
Implication for design:
Messages that must be ordered → same partition key
Order events for user_123:
key = "user_123" → hash("user_123") % 3 = partition 1
All events for user_123 → partition 1 → ordered [OK]
Partition Count Trade-offs
| Few partitions (3-10) | Many partitions (100+) |
|---|---|
| Lower overhead | Higher parallelism |
| Easier to manage | More consumer instances possible |
| Less rebalancing time | Better throughput |
| When: low volume, ordering critical | When: high volume, many consumers |
Rule of thumb:
partition_count ≥ max_consumer_instances (for full parallelism)
partition_count ≈ target_throughput / throughput_per_partition
Example: 100 MB/s needed, 10 MB/s per partition → 10+ partitions
Consumer Groups
Consumer Group mechanics:
Topic: orders (6 partitions)
Consumer Group "analytics": 3 consumers
Assignment:
Consumer 1: [P0, P1]
Consumer 2: [P2, P3]
Consumer 3: [P4, P5]
Each partition → exactly ONE consumer in group
Each consumer → 1+ partitions
Scale up: add Consumer 4 → rebalance → each gets ~1.5 partitions
Scale beyond 6: 7th consumer is IDLE (more consumers than partitions = waste)
Max consumers = partition count
Нельзя иметь больше active consumers в group, чем partitions. Если нужно 20 parallel consumers → нужно минимум 20 partitions. Планируйте partition count с учётом будущего масштабирования.
Delivery Semantics
| Semantic | Description | Use Case |
|---|---|---|
| At-most-once | Message may be lost, never duplicated | Metrics, logs (loss OK) |
| At-least-once | Message never lost, may be duplicated | Default for most pipelines |
| Exactly-once | Message delivered exactly once | Financial, billing (costly to implement) |
Exactly-once in Kafka:
Producer: enable.idempotence=true + transactional.id
Consumer: read_committed isolation
Processing: idempotent writes (upsert, partition overwrite)
Trade-off: higher latency, lower throughput
Kafka 4.0 (March 2025): что изменилось архитектурно
Kafka 4.0 (релиз март 2025) — major release с тремя architectural-уровневыми изменениями, которые меняют deployment-диаграмму и spectrum use-cases:
1. KRaft-only (ZooKeeper deprecated):
До 4.0: Kafka Cluster = Brokers + ZooKeeper ensemble (3-5 nodes)
После 4.0: только KRaft (Raft-based metadata quorum внутри brokers)
Импликации:
- Минус целый компонент в deployment topology
- 500K partitions/cluster limit (был ZK-bound) убран
- Java 17 для brokers (раньше Java 11 ок)
- Apache Kafka 4.0 не поддерживает ZK mode вообще
- Migration: 3.x cluster → upgrade в bridge mode → KRaft
2. KIP-848: новый consumer rebalance protocol:
Старый протокол: stop-the-world, все consumers замораживаются
Новый (server-side): incremental, broker управляет assignment
При join нового consumer:
- Старый: все 100 consumers ребалансируются → 30 сек простоя
- Новый: только нужные partitions передвигаются → секунды
Большие consumer groups (100+ instances) — главные бенефициары.
3. KIP-932 Share Groups (queue semantics, preview в 4.0, GA в 4.2):
До: Kafka = pub-sub log, каждый partition → один consumer в group
После: share groups = queue semantics поверх Kafka topics
- Cooperative consumption: несколько consumers читают один partition
- Individual message ack/redelivery (как RabbitMQ/SQS)
- Unordered processing (за счёт ordering в обмен на queue patterns)
Kafka теперь конкурирует с RabbitMQ/SQS на queue use-cases без
отдельного broker (один system, два workload pattern).
KIP-932 Share Groups — детали реализации
Log segments — физический уровень storage
ZooKeeper больше нет. Если ваш runbook, диаграмма архитектуры или CI/CD упоминают ZK ensemble — устарело с 4.0. KRaft = брокеры сами держат metadata quorum через Raft. Это упрощает ops, но требует новый mental model: controllers могут совмещаться с brokers (combined mode) или быть выделенными (dedicated controllers) — выбор по cluster-size.
Design Decisions
1. Topic Design
Topic naming: {domain}.{entity}.{version}
orders.created.v1
payments.processed.v2
users.updated.v1
One topic per event type (not per service)
[OK] orders.created, orders.updated, orders.cancelled
[NO] order-service-events (mixed event types)
2. Key Selection
Key determines partition assignment:
key = user_id → all events for user in same partition (ordered)
key = order_id → all events for order in same partition
key = null → round-robin across partitions (no ordering)
Choose key based on required ordering:
Need user event ordering? → key = user_id
Need order lifecycle ordering? → key = order_id
No ordering needed? → key = null (best distribution)
3. Retention
Retention strategies:
Time-based: retention.ms = 604800000 (7 days)
Size-based: retention.bytes = 107374182400 (100 GB)
Compacted: keep latest value per key (infinite, deduplicated)
Bronze layer: long retention (30-90 days) — source of truth
Intermediate: short retention (1-7 days) — processing
Compacted: CDC topics (latest state per entity)