Migration pattern v1 -> v2 без поломки consumers
В прошлом уроке мы прошли syntax versions. Этот урок — про операционный аспект migration: как coordinate с consumers, как track migration progress, как enforce deprecation, и как handle laggers.
Хорошая миграция — это process, не tooling. dbt предоставляет mechanism (versions, latest_version, deprecation_date), но coordination — это team work. Каждый шаг должен быть communicated и tracked.
Data Mesh Contracts: governance-рамка для breaking changesПолный migration lifecycle
Step 1: Create v2
Day 0. Producer создаёт v2 в parallel with v1.
- name: fct_orders
latest_version: 1 # v1 still default — keeps consumers stable
versions:
- v: 1
defined_in: fct_orders_v1
config:
contract:
enforced: true
columns:
- name: order_id
data_type: bigint
- name: amount
data_type: numeric(12, 2)
# ... rest of v1 schema
- v: 2
defined_in: fct_orders_v2
config:
contract:
enforced: true
columns:
- name: order_id
data_type: bigint
- name: order_total # renamed
data_type: numeric(12, 2)
# ... rest of v2 schema
SQL files:
fct_orders_v1.sql— unchangedfct_orders_v2.sql— new, contains renames
dbt run:
- Создаются обе таблицы:
fct_orders_v1,fct_orders_v2 - Consumers могут migrate, but не обязаны
ref('fct_orders')-> v=1 (latest_version=1)
Cost considerations: две таблицы = 2x storage + 2x compute на каждом run. Это временный cost — после migration v1 removed.
Для больших таблиц (terabytes) — можно использовать alias model pattern (см. ниже).
Alternative: alias model pattern
Если duplicate storage недопустим, можно сделать v2 как alias на v1:
-- models/marts/fct_orders_v2.sql
SELECT
order_id,
amount AS order_total -- just renamed in view
FROM {{ ref('fct_orders', v=1) }}
С materialized: view:
- v: 2
config:
materialized: view # view, not table
defined_in: fct_orders_v2
fct_orders_v2 становится view на fct_orders_v1. Storage no duplication, но v2 inherits v1 freshness.
Tradeoff:
- No storage cost
- Same data (no consistency issues)
- v2 cannot have additional computations (только renames / type casts)
- Performance overhead view rewrite
- Lineage shows v2 ← v1 ← upstream (additional layer)
Используется для simple renames в больших таблицах. Для structural changes — full v2.
Step 2: Notify consumers
Day 0. Communication critical.
Channels:
- #data-channel в Slack — main broadcast
- Email consumers с known emails (через exposure owners)
- PR template для exposures — alert при upstream version change
- Release notes в release log
Template для notification:
# Migration: fct_orders v1 -> v2
## What's changing
- `fct_orders.amount` renamed to `order_total`
- (other changes, если есть)
## Affected consumers
- monthly_revenue_dashboard (Alice)
- churn_prediction_v3 (Bob)
- weekly_marketing_report (Charlie)
- ... (full list)
## Migration steps
1. **Update SQL references**:
```sql
-- Before
SELECT amount FROM {{ ref('fct_orders') }}
-- After
SELECT order_total FROM {{ ref('fct_orders', v=2) }}
- Test locally:
dbt run --select <your_model>. - PR + review with mention #data-team.
Timeline
- Day 0 (now): v2 available, v1 still default
- Day 60 (2026-07-15): v1 deprecated (warnings issued)
- Day 90 (2026-08-15): v1 removed (builds fail)
Need help?
- Slack: #data-help
- Office hours: Wednesdays 14:00 UTC
- Doc: link to wiki
**Подача**:
- Не один-time notification — повторять в #data-channel weekly
- На 1-2-1 с laggers — individual conversations
- Make migration **easy** — provide examples, office hours
---
## Step 3: Track migration
Day 0-60. Producer monitors progress.
**Простейший tracking — grep**:
```bash
# Список всех references на v1
grep -rn "ref('fct_orders', v=1)" models/ -h
# Список всех references на v2
grep -rn "ref('fct_orders', v=2)" models/ -h
# Список references без version (resolve на latest_version=1)
grep -rn "ref('fct_orders')" models/ -h | grep -v "v="
Through manifest.json:
import json
with open('target/manifest.json') as f:
manifest = json.load(f)
# Find consumers of fct_orders
v1_consumers = []
v2_consumers = []
unversioned_consumers = []
for node in manifest['nodes'].values():
if 'fct_orders' in node.get('depends_on', {}).get('nodes', []):
# check refs in raw_sql
if "ref('fct_orders', v=1)" in node.get('raw_code', ''):
v1_consumers.append(node['name'])
elif "ref('fct_orders', v=2)" in node.get('raw_code', ''):
v2_consumers.append(node['name'])
else:
unversioned_consumers.append(node['name'])
print(f"v1 explicit: {len(v1_consumers)}")
print(f"v2 explicit: {len(v2_consumers)}")
print(f"Unversioned: {len(unversioned_consumers)}")
print(f"v1 consumers: {v1_consumers}")
Weekly cadence:
- Каждую неделю check counts
- Если v1 count not decreasing — bottleneck somewhere
- Reach out personally к v1 consumers’ owners
Dashboard для visibility: Создать simple internal dashboard tracking migration progress:
-
of v1 refs
-
of v2 refs
-
of unversioned (will switch when latest_version changes)
- Target: v1 count -> 0 by deprecation_date
Step 4: Set latest_version + deprecation_date
Day 60. Most consumers migrated. Switch default to v2.
- name: fct_orders
latest_version: 2 # default теперь v2!
versions:
- v: 1
deprecation_date: '2026-08-15' # 30 days remaining
...
- v: 2
...
Effects:
ref('fct_orders')(без version) — теперь resolves to v2- Unversioned consumers auto-switch (это backward-incompatible если они хотели stay on v1)
- Explicit
ref('fct_orders', v=1)— still works, but warnings emitted
Communication:
[Migration Update] fct_orders: latest_version -> 2
Today we switched the default version of fct_orders from v1 to v2.
If you used ref('fct_orders') without version — your model now points to v2.
This means:
- [x] If you've already updated SQL (amount -> order_total) — works as expected
- [ ] If you still use 'amount' but ref without version — build fails
Quick fix:
1. Update SQL to use order_total
2. OR pin to v=1 temporarily: ref('fct_orders', v=1)
v1 deprecated on 2026-08-15 — please migrate by then.
Step 5: Remove v1
Day 90 (deprecation_date passed). Cleanup.
# Удалить v1 из YAML
- name: fct_orders
latest_version: 2
versions:
- v: 2
defined_in: fct_orders_v2
...
Delete fct_orders_v1.sql file.
После next dbt run:
- Table
fct_orders_v1остаётся в warehouse (dbt не auto-drop ophaned tables) - Manual cleanup в warehouse или через
dbt run-operation drop_orphaned_tables
Last-call для laggers:
Любой consumer, ref’ивший ref('fct_orders', v=1), теперь fails parsing:
Compilation error: 'fct_orders' has no version 1
Это breaking для laggers — но они ignored 90 days of warnings. Acceptable.
Step 6: Optional cleanup
Если только v2 remains, можно rename to no-version model:
- name: fct_orders
# no versions block
columns:
- name: order_id
data_type: bigint
- name: order_total
data_type: numeric(12, 2)
SQL: rename fct_orders_v2.sql -> fct_orders.sql.
Consumers ref’ивающие ref('fct_orders', v=2) — нужно update to ref('fct_orders'). Это another mini-migration — but smaller scope.
Альтернатива — keep v2 forever:
- name: fct_orders
latest_version: 2
versions:
- v: 2
...
ref('fct_orders') works (resolves v=2). ref('fct_orders', v=2) works. Both clients accommodated. Это stable forever state if no more migrations needed.
Edge cases в migration
1. Multiple breaking changes at once
Producer хочет:
- Rename
amount -> order_total - Add new column
currency - Drop
legacy_status
Все в один v2:
- v: 2
columns:
- name: order_id
- name: order_total # renamed
- name: currency # new
# legacy_status: dropped
Communication should clearly list all changes. Bigger migration but one window. Better than 3 separate migrations.
2. Multiple v2 candidates
Two teams propose conflicting changes:
- Team A: rename amount -> total
- Team B: change amount type bigint -> numeric
Both can’t be in v2 at once. Coordination needed:
- Either merge both into one v2 (with both changes)
- Or sequence: v2 (Team A) -> v3 (Team B), with each migration window
3. Critical bug в v1
v1 has a bug (wrong formula). Should v2 fix it, or update v1?
Update v1 directly (внутренний fix):
- All consumers (v1 and unversioned) get correct number
- No migration burden
- Changes v1 semantics — consumers might depend on (incorrect) behavior
Create v2 with fix:
- v1 stable for consumers depending on it
- Consumers on v1 keep wrong number forever
- More work для everyone
Convention: bug fixes — direct update. Schema changes / formula changes (business meaning) — new version.
4. Consumer not migrating ever (legacy systems)
Some legacy system can’t update SQL. Strategies:
- Permanent v1 alias: keep v1 forever, mark
maturity: low. Cost of duplicate storage. - Migrate at warehouse level: create view
fct_orders_v1_compatmatching old schema. Apply to consumer. - Hard deadline: communicate, then enforce. Если consumer breaks — accept it as cost.
Production reality: some legacy systems force long-term parallel versions. Plan accordingly.
Communication template
Initial notification (Day 0):
[Migration Announce] fct_orders v1 -> v2 (rename amount -> order_total)
When: starting today
Why: align with finance team naming standards
How long: 90 days migration window
Affected: 12 dashboards, 3 ML models (full list: <wiki>)
Action items:
1. Review your model: does it ref fct_orders?
2. If yes: plan migration (update ref + replace column name)
3. Office hours Wednesdays 14:00 UTC для help
Timeline:
- Day 0 (today): v2 available
- Day 60: v1 deprecation_date set
- Day 90: v1 removed
Owner: Alice (alice@..., Slack #data-help)
Mid-migration push (Day 30):
[Migration Reminder] fct_orders v1 -> v2 (50% complete)
Status: 6 of 12 dashboards migrated. 6 remaining.
Remaining:
- weekly_marketing_report (Charlie) — ETA?
- churn_prediction_v3 (Bob) — ETA?
- ... full list
Please confirm timeline for migration. Deprecation in 30 days.
Pre-deprecation (Day 50):
[Migration Final Call] fct_orders v1 -> v2 (10 days to deprecation)
v1 will issue warnings on 2026-07-15 (next week).
v1 will be REMOVED on 2026-08-15 (45 days).
Remaining laggers: <list>
Hard deadline. Please migrate ASAP.
Post-deprecation (Day 90):
[Migration Complete] fct_orders v1 removed
v1 удалена из dbt-проекта.
v2 теперь default — ref('fct_orders') -> v2.
Если ваш build failed — pin to v=2 explicitly или fix references.
Thanks for cooperation!
Антипаттерны migration
-
No communication: создал v2, ожидал что consumers сами найдут. Result: 90 days pass, v1 still in use, breaking change at deprecation.
-
No tracking: создал v2, не следил migration progress. Result: false sense of completion, surprise breakage.
-
Too short timeline: 7 days для 50 consumers. Consumers не успевают coordinate с своими team / customers. Result: forced rush, broken consumers.
-
Too long timeline: 12 months. Result: forgotten, v1 + v2 живут years. Cost piles up.
-
No deprecation_date: warnings never visible. Consumers ignore until removed. Result: surprises.
-
Removing v1 in same PR что creates v2: defeats purpose of versions. Should be separate PRs, separate windows.
-
Hidden breaking changes в bug fix: ‘updated formula’ — but actually changed return type. No version -> consumers break silently. Communicate semantics.
-
No fallback for laggers: legacy system can’t migrate. No plan. Result: forced removal break legacy.
Попробуй сам
-
Simulate migration в своём dbt-проекте:
- Take a model with one column
- Create v2 with renamed column
- Update one consumer to ref(v=2)
- Leave one consumer на v1
-
Track:
grep -rn "ref('your_model', v=1)" models/ grep -rn "ref('your_model', v=2)" models/ -
Test deprecation:
- Set
deprecation_dateto past date in v1 dbt run— should see warnings
- Set
-
Test removal:
- Remove v1 from YAML
dbt parse— consumer на v1 should fail compilation
-
Restore — практика на test cycle.
Ключевые выводы
- Migration lifecycle (90 days typical): create v2 -> notify -> track -> set latest_version + deprecation_date -> remove v1 -> optional cleanup.
- Two tables в warehouse: v1 and v2 parallel. Cost — temporary 2x storage. Alternative — view alias для simple renames (saves storage, no extra compute).
- Notification template: what’s changing, who affected, migration steps, timeline, contacts. Repeat over time. Office hours для support.
- Tracking: grep / manifest.json для count v=1 vs v=2 refs. Weekly cadence. Push laggers individually.
- latest_version switch (day 60): default changes from v1 -> v2. Unversioned consumers auto-switch. Last warning.
- Hard deadline: deprecation_date enforced — v1 removed despite laggers (after sufficient communication).
- Edge cases: multiple breaking changes (bundle), conflicting v2 candidates (coordinate), bug fix (direct update vs version), legacy laggers (permanent v1 or migration view).
- Антипаттерны: no communication, no tracking, too short/long timeline, no deprecation_date, hidden breaking changes, no fallback.