Learning Platform
Глоссарий Troubleshooting
Урок 11.04 · 24 мин
Средний
MigrationVersionsCommunicationDeprecationPhased rollout

Migration pattern v1 -> v2 без поломки consumers

В прошлом уроке мы прошли syntax versions. Этот урок — про операционный аспект migration: как coordinate с consumers, как track migration progress, как enforce deprecation, и как handle laggers.

Хорошая миграция — это process, не tooling. dbt предоставляет mechanism (versions, latest_version, deprecation_date), но coordination — это team work. Каждый шаг должен быть communicated и tracked.

Data Mesh Contracts: governance-рамка для breaking changes

Полный migration lifecycle

Producer
v2 alive
Notify
Track migration
Deprecate v1
Remove v1
Cleanup
1. Create v2 (day 0)2. Notify consumers (day 0)3. Track migration (day 0-60)4. Set deprecation_date (day 60)5. Remove v1 (day 90)6. Cleanup (optional)

Step 1: Create v2

Day 0. Producer создаёт v2 в parallel with v1.

- name: fct_orders
  latest_version: 1   # v1 still default — keeps consumers stable

  versions:
    - v: 1
      defined_in: fct_orders_v1
      config:
        contract:
          enforced: true
      columns:
        - name: order_id
          data_type: bigint
        - name: amount
          data_type: numeric(12, 2)
        # ... rest of v1 schema

    - v: 2
      defined_in: fct_orders_v2
      config:
        contract:
          enforced: true
      columns:
        - name: order_id
          data_type: bigint
        - name: order_total   # renamed
          data_type: numeric(12, 2)
        # ... rest of v2 schema

SQL files:

  • fct_orders_v1.sql — unchanged
  • fct_orders_v2.sql — new, contains renames

dbt run:

  • Создаются обе таблицы: fct_orders_v1, fct_orders_v2
  • Consumers могут migrate, but не обязаны
  • ref('fct_orders') -> v=1 (latest_version=1)

Cost considerations: две таблицы = 2x storage + 2x compute на каждом run. Это временный cost — после migration v1 removed.

Для больших таблиц (terabytes) — можно использовать alias model pattern (см. ниже).


Alternative: alias model pattern

Если duplicate storage недопустим, можно сделать v2 как alias на v1:

-- models/marts/fct_orders_v2.sql
SELECT
    order_id,
    amount AS order_total   -- just renamed in view
FROM {{ ref('fct_orders', v=1) }}

С materialized: view:

- v: 2
  config:
    materialized: view   # view, not table
  defined_in: fct_orders_v2

fct_orders_v2 становится view на fct_orders_v1. Storage no duplication, но v2 inherits v1 freshness.

Tradeoff:

  • No storage cost
  • Same data (no consistency issues)
  • v2 cannot have additional computations (только renames / type casts)
  • Performance overhead view rewrite
  • Lineage shows v2 ← v1 ← upstream (additional layer)

Используется для simple renames в больших таблицах. Для structural changes — full v2.


Step 2: Notify consumers

Day 0. Communication critical.

Channels:

  1. #data-channel в Slack — main broadcast
  2. Email consumers с known emails (через exposure owners)
  3. PR template для exposures — alert при upstream version change
  4. Release notes в release log

Template для notification:

# Migration: fct_orders v1 -> v2

## What's changing
- `fct_orders.amount` renamed to `order_total`
- (other changes, если есть)

## Affected consumers
- monthly_revenue_dashboard (Alice)
- churn_prediction_v3 (Bob)
- weekly_marketing_report (Charlie)
- ... (full list)

## Migration steps
1. **Update SQL references**:
   ```sql
   -- Before
   SELECT amount FROM {{ ref('fct_orders') }}

   -- After
   SELECT order_total FROM {{ ref('fct_orders', v=2) }}
  1. Test locally: dbt run --select <your_model>.
  2. PR + review with mention #data-team.

Timeline

  • Day 0 (now): v2 available, v1 still default
  • Day 60 (2026-07-15): v1 deprecated (warnings issued)
  • Day 90 (2026-08-15): v1 removed (builds fail)

Need help?

  • Slack: #data-help
  • Office hours: Wednesdays 14:00 UTC
  • Doc: link to wiki

**Подача**:
- Не один-time notification — повторять в #data-channel weekly
- На 1-2-1 с laggers — individual conversations
- Make migration **easy** — provide examples, office hours

---

## Step 3: Track migration

Day 0-60. Producer monitors progress.

**Простейший tracking — grep**:

```bash
# Список всех references на v1
grep -rn "ref('fct_orders', v=1)" models/ -h

# Список всех references на v2
grep -rn "ref('fct_orders', v=2)" models/ -h

# Список references без version (resolve на latest_version=1)
grep -rn "ref('fct_orders')" models/ -h | grep -v "v="

Through manifest.json:

import json

with open('target/manifest.json') as f:
    manifest = json.load(f)

# Find consumers of fct_orders
v1_consumers = []
v2_consumers = []
unversioned_consumers = []

for node in manifest['nodes'].values():
    if 'fct_orders' in node.get('depends_on', {}).get('nodes', []):
        # check refs in raw_sql
        if "ref('fct_orders', v=1)" in node.get('raw_code', ''):
            v1_consumers.append(node['name'])
        elif "ref('fct_orders', v=2)" in node.get('raw_code', ''):
            v2_consumers.append(node['name'])
        else:
            unversioned_consumers.append(node['name'])

print(f"v1 explicit: {len(v1_consumers)}")
print(f"v2 explicit: {len(v2_consumers)}")
print(f"Unversioned: {len(unversioned_consumers)}")
print(f"v1 consumers: {v1_consumers}")

Weekly cadence:

  • Каждую неделю check counts
  • Если v1 count not decreasing — bottleneck somewhere
  • Reach out personally к v1 consumers’ owners

Dashboard для visibility: Создать simple internal dashboard tracking migration progress:

  • of v1 refs

  • of v2 refs

  • of unversioned (will switch when latest_version changes)

  • Target: v1 count -> 0 by deprecation_date

Step 4: Set latest_version + deprecation_date

Day 60. Most consumers migrated. Switch default to v2.

- name: fct_orders
  latest_version: 2   # default теперь v2!

  versions:
    - v: 1
      deprecation_date: '2026-08-15'   # 30 days remaining
      ...
    - v: 2
      ...

Effects:

  • ref('fct_orders') (без version) — теперь resolves to v2
  • Unversioned consumers auto-switch (это backward-incompatible если они хотели stay on v1)
  • Explicit ref('fct_orders', v=1) — still works, but warnings emitted

Communication:

[Migration Update] fct_orders: latest_version -> 2

Today we switched the default version of fct_orders from v1 to v2.

If you used ref('fct_orders') without version — your model now points to v2.
This means:
- [x] If you've already updated SQL (amount -> order_total) — works as expected
- [ ] If you still use 'amount' but ref without version — build fails

Quick fix:
1. Update SQL to use order_total
2. OR pin to v=1 temporarily: ref('fct_orders', v=1)

v1 deprecated on 2026-08-15 — please migrate by then.

Step 5: Remove v1

Day 90 (deprecation_date passed). Cleanup.

# Удалить v1 из YAML
- name: fct_orders
  latest_version: 2

  versions:
    - v: 2
      defined_in: fct_orders_v2
      ...

Delete fct_orders_v1.sql file.

После next dbt run:

  • Table fct_orders_v1 остаётся в warehouse (dbt не auto-drop ophaned tables)
  • Manual cleanup в warehouse или через dbt run-operation drop_orphaned_tables

Last-call для laggers: Любой consumer, ref’ивший ref('fct_orders', v=1), теперь fails parsing:

Compilation error: 'fct_orders' has no version 1

Это breaking для laggers — но они ignored 90 days of warnings. Acceptable.


Step 6: Optional cleanup

Если только v2 remains, можно rename to no-version model:

- name: fct_orders
  # no versions block
  columns:
    - name: order_id
      data_type: bigint
    - name: order_total
      data_type: numeric(12, 2)

SQL: rename fct_orders_v2.sql -> fct_orders.sql.

Consumers ref’ивающие ref('fct_orders', v=2) — нужно update to ref('fct_orders'). Это another mini-migration — but smaller scope.

Альтернатива — keep v2 forever:

- name: fct_orders
  latest_version: 2

  versions:
    - v: 2
      ...

ref('fct_orders') works (resolves v=2). ref('fct_orders', v=2) works. Both clients accommodated. Это stable forever state if no more migrations needed.


Edge cases в migration

1. Multiple breaking changes at once

Producer хочет:

  • Rename amount -> order_total
  • Add new column currency
  • Drop legacy_status

Все в один v2:

- v: 2
  columns:
    - name: order_id
    - name: order_total   # renamed
    - name: currency      # new
    # legacy_status: dropped

Communication should clearly list all changes. Bigger migration but one window. Better than 3 separate migrations.

2. Multiple v2 candidates

Two teams propose conflicting changes:

  • Team A: rename amount -> total
  • Team B: change amount type bigint -> numeric

Both can’t be in v2 at once. Coordination needed:

  • Either merge both into one v2 (with both changes)
  • Or sequence: v2 (Team A) -> v3 (Team B), with each migration window

3. Critical bug в v1

v1 has a bug (wrong formula). Should v2 fix it, or update v1?

Update v1 directly (внутренний fix):

  • All consumers (v1 and unversioned) get correct number
  • No migration burden
  • Changes v1 semantics — consumers might depend on (incorrect) behavior

Create v2 with fix:

  • v1 stable for consumers depending on it
  • Consumers on v1 keep wrong number forever
  • More work для everyone

Convention: bug fixes — direct update. Schema changes / formula changes (business meaning) — new version.

4. Consumer not migrating ever (legacy systems)

Some legacy system can’t update SQL. Strategies:

  • Permanent v1 alias: keep v1 forever, mark maturity: low. Cost of duplicate storage.
  • Migrate at warehouse level: create view fct_orders_v1_compat matching old schema. Apply to consumer.
  • Hard deadline: communicate, then enforce. Если consumer breaks — accept it as cost.

Production reality: some legacy systems force long-term parallel versions. Plan accordingly.


Communication template

Initial notification (Day 0):

[Migration Announce] fct_orders v1 -> v2 (rename amount -> order_total)

When: starting today
Why: align with finance team naming standards
How long: 90 days migration window
Affected: 12 dashboards, 3 ML models (full list: <wiki>)

Action items:
1. Review your model: does it ref fct_orders?
2. If yes: plan migration (update ref + replace column name)
3. Office hours Wednesdays 14:00 UTC для help

Timeline:
- Day 0 (today): v2 available
- Day 60: v1 deprecation_date set
- Day 90: v1 removed

Owner: Alice (alice@..., Slack #data-help)

Mid-migration push (Day 30):

[Migration Reminder] fct_orders v1 -> v2 (50% complete)

Status: 6 of 12 dashboards migrated. 6 remaining.

Remaining:
- weekly_marketing_report (Charlie) — ETA?
- churn_prediction_v3 (Bob) — ETA?
- ... full list

Please confirm timeline for migration. Deprecation in 30 days.

Pre-deprecation (Day 50):

[Migration Final Call] fct_orders v1 -> v2 (10 days to deprecation)

v1 will issue warnings on 2026-07-15 (next week).
v1 will be REMOVED on 2026-08-15 (45 days).

Remaining laggers: <list>

Hard deadline. Please migrate ASAP.

Post-deprecation (Day 90):

[Migration Complete] fct_orders v1 removed

v1 удалена из dbt-проекта.
v2 теперь default — ref('fct_orders') -> v2.

Если ваш build failed — pin to v=2 explicitly или fix references.

Thanks for cooperation!

Антипаттерны migration

  1. No communication: создал v2, ожидал что consumers сами найдут. Result: 90 days pass, v1 still in use, breaking change at deprecation.

  2. No tracking: создал v2, не следил migration progress. Result: false sense of completion, surprise breakage.

  3. Too short timeline: 7 days для 50 consumers. Consumers не успевают coordinate с своими team / customers. Result: forced rush, broken consumers.

  4. Too long timeline: 12 months. Result: forgotten, v1 + v2 живут years. Cost piles up.

  5. No deprecation_date: warnings never visible. Consumers ignore until removed. Result: surprises.

  6. Removing v1 in same PR что creates v2: defeats purpose of versions. Should be separate PRs, separate windows.

  7. Hidden breaking changes в bug fix: ‘updated formula’ — but actually changed return type. No version -> consumers break silently. Communicate semantics.

  8. No fallback for laggers: legacy system can’t migrate. No plan. Result: forced removal break legacy.


Попробуй сам

  1. Simulate migration в своём dbt-проекте:

    • Take a model with one column
    • Create v2 with renamed column
    • Update one consumer to ref(v=2)
    • Leave one consumer на v1
  2. Track:

    grep -rn "ref('your_model', v=1)" models/
    grep -rn "ref('your_model', v=2)" models/
  3. Test deprecation:

    • Set deprecation_date to past date in v1
    • dbt run — should see warnings
  4. Test removal:

    • Remove v1 from YAML
    • dbt parse — consumer на v1 should fail compilation
  5. Restore — практика на test cycle.


Ключевые выводы

  1. Migration lifecycle (90 days typical): create v2 -> notify -> track -> set latest_version + deprecation_date -> remove v1 -> optional cleanup.
  2. Two tables в warehouse: v1 and v2 parallel. Cost — temporary 2x storage. Alternative — view alias для simple renames (saves storage, no extra compute).
  3. Notification template: what’s changing, who affected, migration steps, timeline, contacts. Repeat over time. Office hours для support.
  4. Tracking: grep / manifest.json для count v=1 vs v=2 refs. Weekly cadence. Push laggers individually.
  5. latest_version switch (day 60): default changes from v1 -> v2. Unversioned consumers auto-switch. Last warning.
  6. Hard deadline: deprecation_date enforced — v1 removed despite laggers (after sufficient communication).
  7. Edge cases: multiple breaking changes (bundle), conflicting v2 candidates (coordinate), bug fix (direct update vs version), legacy laggers (permanent v1 or migration view).
  8. Антипаттерны: no communication, no tracking, too short/long timeline, no deprecation_date, hidden breaking changes, no fallback.
Проверка знанийKnowledge check
Команда создала v2 для fct_orders, но через 60 дней migration не двинулся — все consumers всё ещё на v1. Что нужно было сделать иначе?
ОтветAnswer
Causes и solutions:\n\n**1. No communication beyond Day 0**:\n- One-time announcement в Slack — easily missed\n- No reminders\n- No tracking dashboard visible to leadership\n\n**Solution**: weekly reminders в #data-channel, 1-2-1 с laggers, monthly status report.\n\n**2. No tracking**:\n- Producer created v2 и forgot\n- No metric: 'how many consumers migrated?'\n- Surprise at Day 60\n\n**Solution**: weekly grep on `ref('fct_orders', v=1)` count. Dashboard в Looker / Tableau показывает progress.\n\n**3. Migration painful / unclear**:\n- No clear example в notification\n- Consumers не знают, где начать\n- Difficult to test locally\n\n**Solution**:\n- Provide example PR for one consumer (template)\n- Office hours weekly для questions\n- Migration wiki / FAQ\n- Pair с consumer engineers для first migration\n\n**4. No deadline enforcement**:\n- Soft 'please migrate' language\n- No deprecation_date set yet\n- No consequences для not migrating\n\n**Solution**: set `deprecation_date` in YAML at Day 30 (not Day 60). Warnings emitted на every dbt run from Day 30. Consumers see pressure.\n\n**5. Latest_version still pointing to v1**:\n- Unversioned consumers (`ref('fct_orders')`) still get v1\n- They don't realize v2 exists\n\n**Solution**: at Day 30 (после initial migrations), switch `latest_version: 2`. Unversioned consumers auto-switch. Forces attention for those still on v1.\n\n**6. No incentive для consumer teams**:\n- v1 works perfectly, no urgency\n- Migration overhead with no immediate benefit\n\n**Solution**: communicate **benefits of v2** (new column, better naming, fix). Tie migration to consumer's own goals.\n\n**7. Producer not following up**:\n- Created v2, moved to next task\n- No ownership of migration progress\n\n**Solution**: producer is responsible until v1 removed. This is **product management** — feature isn't done until rolled out.\n\n**Reset action plan**:\n\n1. Week 1: identify all v1 consumers, list with names+emails\n2. Week 2: 1-2-1 conversations with each, agree on migration date\n3. Week 3-4: pair-program migrations с volunteers, document lessons\n4. Week 5: set deprecation_date, switch latest_version\n5. Week 6-8: warnings + push laggers\n6. Week 9: enforce removal\n\n90 days realistic if active management. 60 days passive — fail.
Проверка знанийKnowledge check
Legacy system читает fct_orders.amount, owner ушёл из компании, нет ETA для migration. Можно ли просто removed v1 после deprecation_date?
ОтветAnswer
Это **judgment call** между clean codebase и breaking production.\n\n**Options**:\n\n**Option 1 — Hard removal**:\n- After deprecation_date pass — remove v1 regardless\n- Legacy system breaks\n- Consequence: dependency on data team — fix legacy or accept breakage\n\n**Pros**: clean codebase, no perpetual technical debt\n**Cons**: production system breaks, customer/internal user impact\n\n**Option 2 — Permanent v1 alias**:\n- Keep v1 in dbt forever\n- Mark `maturity: low` и `description`: 'Legacy compat — only for X system'\n- Cost: 2x storage for old data\n\n**Pros**: legacy works\n**Cons**: technical debt forever, sets precedent\n\n**Option 3 — Warehouse-level alias**:\n- Remove v1 from dbt\n- Create view at warehouse level (manual или through dbt as separate model):\n\n```sql\n-- models/marts/_compat/fct_orders_legacy.sql\nSELECT\n order_id,\n order_total AS amount -- legacy name mapping\nFROM {{ ref('fct_orders') }}\n```\n\nLegacy system points to `fct_orders_legacy` (или create direct view in warehouse without dbt). Clean separation.\n\n**Pros**: legacy works, no technical debt в main models\n**Cons**: extra view, performance overhead\n\n**Option 4 — Coordinate с stakeholders**:\n- Find new owner of legacy system\n- Demand migration with hard deadline\n- Если no commitment — escalate to leadership\n\n**Pros**: forces action\n**Cons**: political, slow\n\n**Recommended approach** (production-grade):\n\n1. **Try Option 4 first** (coordination). 2-4 weeks for ownership finding.\n2. If failed -> **Option 3** (warehouse alias) for legacy compatibility. Clean dbt repo.\n3. Avoid Option 2 (permanent in dbt) — sets bad precedent.\n4. Avoid Option 1 (hard removal) если legacy is business-critical.\n\n**Process for future**:\n\n- **Exposure для каждого consumer** — including legacy systems. Owner field is **mandatory**.\n- **Quarterly review** owners — when someone leaves, transfer ownership before offboarding.\n- **No anonymous consumers** — every reader of warehouse data has owner.\n\nЭто **organizational** problem — orphaned dependencies. Solution — institutional ownership, not technical workaround.\n\n**Bottom line**: data team can remove v1, but **only после exhausting communication options** и **with management backing** для accepting fallout. Without those — be a good neighbor: warehouse alias.

Проверьте понимание

Результат: 0 из 0
Аналитический
Вопрос 1 из 5. Команда создала v2, через 60 дней migration не двинулся — все consumers всё ещё на v1. Что нужно было сделать иначе?

Закончили урок?

Отметьте его как пройденный, чтобы отслеживать свой прогресс

Войдите чтобы оценить урок

Прогресс модуля
0 из 5