Learning Platform
Глоссарий Troubleshooting
Урок 12.04 · 24 мин
Продвинутый
versionsmigrationdeprecationcommunication

Versioned cross-project models: v1 -> v2 migration

Когда public model в Mesh нужно изменить breaking способом (rename column, change data type, remove field) — это нельзя сделать в-лоб. Consumers, которые depend на текущую схему, ломаются. Решение — versioned models, где старая v1 и новая v2 сосуществуют, consumers migrate в своём темпе.

Этот урок — про практику version migration: setup, communication, monitoring, gotchas.

Migration pattern v1 -> v2 без поломки consumers (dbt II)

Зачем versions

Без versions — Finance меняет column в fct_revenue. Marketing dashboard рушится через 5 минут. Это плохой experience.

С versions:

  • Finance делает v2 с изменённой схемой.
  • v1 остаётся работать (старая SQL, старая table).
  • Marketing видит deprecation warning, мigrate когда удобно.
  • Через 3 месяца Finance удаляет v1.

Это smooth migration vs hard breakage.

Setup versioned model

В dbt versioning происходит через schema.yml declarations + два physical SQL файла:

# finance_dbt/models/schema.yml
models:
  - name: fct_revenue
    access: public
    latest_version: 2
    config:
      contract:
        enforced: true
    
    versions:
      - v: 1
        deprecation_date: '2026-12-31'
        defined_in: fct_revenue_v1
        columns:
          - name: date
            data_type: date
          - name: revenue
            data_type: decimal(10,2)  # old precision
      
      - v: 2
        defined_in: fct_revenue_v2
        columns:
          - name: date
            data_type: date
          - name: revenue_usd      # renamed from 'revenue'
            data_type: decimal(18,2)  # changed precision
          - name: product_id        # new column
            data_type: string

Physical файлы:

-- models/fct_revenue_v1.sql
{{ config(materialized='table') }}
select
    date,
    revenue
from {{ ref('int_revenue') }}
-- models/fct_revenue_v2.sql
{{ config(materialized='table') }}
select
    date,
    revenue as revenue_usd,
    product_id
from {{ ref('int_revenue_v2') }}

dbt build создаёт две физических таблицы:

  • analytics.finance.fct_revenue_v1
  • analytics.finance.fct_revenue_v2

latest_version: 2 — означает, что ref('finance', 'fct_revenue') без явной version -> v2.

Consumer’s view: ref with v

В downstream:

-- Marketing: pin to v1
select * from {{ ref('finance', 'fct_revenue', v=1) }}

-- Product: use latest (v2)
select * from {{ ref('finance', 'fct_revenue') }}

-- Another: pin to v2 explicitly
select * from {{ ref('finance', 'fct_revenue', v=2) }}

ref(..., v=1) резолвится к fct_revenue_v1. ref(...) без version — к fct_revenue_v2 (latest_version).

Каждая consuming model wybira свою version. Marketing может stay on v1 пока migrate’ятся. Product мovesна v2 сразу.

Deprecation date

В schema.yml:

versions:
  - v: 1
    deprecation_date: '2026-12-31'

После deprecation_date:

  • dbt при compile печатает warning для consumers, которые используют v1.
  • Run всё ещё works (data в table остаётся).
  • Это soft signal — нужно migrate.
[WARNING] Model 'fct_revenue' v=1 in 'finance' is deprecated. 
         Deprecation date: 2026-12-31 (passed 45 days ago).
         Please migrate to v=2.

Через какое-то время (типично 3-6 месяцев после deprecation_date) — Finance удаляет v1:

  • Удаляет fct_revenue_v1.sql file.
  • Удаляет v: 1 declaration из schema.yml.
  • Удаляет physical table в warehouse.

После этого ref(‘finance’, ‘fct_revenue’, v=1) — compilation error.

Migration workflow: Producer’s side

Step-by-step для Finance team:

Step 1: Создать v2

-- models/fct_revenue_v2.sql
{{ config(materialized='table') }}
select
    date,
    revenue as revenue_usd,
    product_id
from {{ ref('int_revenue') }}

Step 2: Update schema.yml

- name: fct_revenue
  access: public
  latest_version: 2  # changed from 1
  
  versions:
    - v: 1
      deprecation_date: '2026-12-31'  # дёрнули deprecation
      defined_in: fct_revenue_v1
    - v: 2
      defined_in: fct_revenue_v2

Step 3: Build

dbt build --select fct_revenue

dbt build обе версии. Создаёт fct_revenue_v1 и fct_revenue_v2 физические таблицы.

Step 4: Communicate consumers

  • Slack message в #data-platform-changes: “fct_revenue v2 available. v1 deprecation 2026-12-31.”
  • Email список consumers (из manifest, см. ниже).
  • Update docs в team wiki.

Step 5: Monitor adoption

Скрипт, парсящий cross-project manifests для usage analysis:

# scripts/version_usage.py
import json
from pathlib import Path

def find_v1_consumers(target_model='fct_revenue', target_version=1):
    consumers = []
    for manifest_file in Path('manifests/').glob('*/manifest.json'):
        with open(manifest_file) as f:
            manifest = json.load(f)
        
        for node in manifest['nodes'].values():
            compiled = node.get('compiled_code', '')
            if f"fct_revenue_v{target_version}" in compiled:
                consumers.append({
                    'project': manifest['metadata']['project_name'],
                    'model': node['name'],
                })
    return consumers

print(find_v1_consumers())

Это даёт list of consumers, ещё на v1. Можно target communication.

Step 6: Final removal

Когда usage v1 = 0:

  1. Remove fct_revenue_v1.sql.
  2. Remove v: 1 из schema.yml.
  3. Drop analytics.finance.fct_revenue_v1 в warehouse.
  4. Communicate: “v1 удалена”.

Migration workflow: Consumer’s side

Для Marketing team:

Step 1: Stay on v1 (initially)

После Finance релизит v2 — Marketing продолжает работать на v1:

select * from {{ ref('finance', 'fct_revenue', v=1) }}

Никаких изменений, всё работает.

Step 2: Plan migration

Marketing reviews v2 schema:

  • revenue -> revenue_usd (rename, нужно update SELECT).
  • New column product_id — optional, можно use или нет.

Step 3: Create migration PR

-- Before:
select
    date,
    revenue
from {{ ref('finance', 'fct_revenue', v=1) }}

-- After:
select
    date,
    revenue_usd as revenue,  -- alias на старое имя для backward compat
    product_id
from {{ ref('finance', 'fct_revenue', v=2) }}

Test, deploy, merge. Marketing now on v2.

Step 4: Monitor

После migrate — watch for issues. v2 имеет new column product_id — maybe data quality issues? Tests могут catch.

Что считается breaking change

Не все изменения нужны version bump. Some changes safe:

Non-breaking (additive):

  • Adding new column. Old SELECT continues to work.
  • Adding new optional column. Same.
  • Adding new not-null column with default value.

Breaking:

  • Renaming column.
  • Removing column.
  • Changing column data_type incompatible way (decimal -> text).
  • Changing constraint (not_null -> nullable for query expecting not_null).

Subtle (sometimes breaking):

  • Changing nullability (nullable -> not_null) — old data with nulls may break.
  • Adding constraint (uniqueness) — may break if data has duplicates.

For additive changes — don’t bump version. Просто add column to existing v1 schema.yml.

For breaking — version bump.

Versions vs branching

Какой подход лучше — versions или git branching?

AspectVersions (in dbt)Git branching
Concurrent v1/v2Yes — two physical tablesNo — only one in prod
Consumer choiceEach consumer picks vAll consumers stuck
Atomic releasesNo (gradual migrate)Yes (all at once)
Storage cost2x (v1 + v2 в warehouse)1x
ComplexityMediumLow

Versions = gradual migration, медленнее, но безопаснее. Branching = atomic, быстрее, но рискованнее (всё ломается синхронно).

В Mesh обычно используются versions для public API stability.

Subtle gotcha: shared upstream

Sometimes v1 и v2 разделяют upstream. Например:

-- fct_revenue_v1.sql
select date, revenue from {{ ref('int_revenue') }}

-- fct_revenue_v2.sql
select date, revenue as revenue_usd from {{ ref('int_revenue') }}  -- тот же upstream

При build — int_revenue собирается один раз, две версии запрашивают его. Это OK.

Но если v2 требует другой upstream (например, новый int_revenue_v2):

-- fct_revenue_v2.sql
select date, revenue_usd, product_id from {{ ref('int_revenue_v2') }}

И int_revenue_v2 тоже public + versioned… быстро запутаешься. Best practice — сохранять versioning только на edges (public exposing), internal модели не versioning.

Failure modes

1. Breaking change без version

Finance меняет column type revenue: decimal(10,2) -> decimal(18,2) без version bump. With contract enforced:

Contract validation failed.

dbt build падает. Это good — Finance не может сломать contract silent.

Без contract — silent breakage downstream.

2. Consumer stuck on v1 после deprecation

Deprecation date passed. v1 ещё используется. Eventually Finance drops v1:

Compilation error: 'fct_revenue' v=1 not found.

Consumer build падает. Это late-fail.

Fix: monitor v1 usage перед drop, ensure all migrated.

3. v1 и v2 diverge silently

После split на v1 и v2 — bug fix landed в int_revenue (upstream обоих). Логика хорошая для v2, но breaks v1 semantically. v1 теперь возвращает чуть другие numbers.

Fix: keep v1 frozen — отдельные upstream pinned to v1’s original logic. Не shared upstream после version split.

4. Version drift в большом ecosystem

Finance имеет v1, v2, v3 параллельно. Marketing на v2. Product на v3. New team создаёт project — какой версии использовать? Default latest (v3), но maybe team should be на v2 for consistency с другими project.

Fix: documentation + recommendation. “New projects use latest_version. Existing projects stay until ready to migrate.”

5. v3 inhrabит deprecation v1

Finance имеет v1 (deprecating soon) and v2 (current). Sometime later adds v3. Now v1 + v2 + v3 in prod. Marketing on v1 still — Finance forgot to push migration. v1 deprecation hits, drop, Marketing breakage.

Fix: discipline. Don’t add v3 пока v1 customers gone (or until v1 fully removed).

Best practices

  1. Reserve versioning for breaking changes — additive changes don’t need version.
  2. Plan deprecation timeline upfront — when bump version, set deprecation date 6-12 months out.
  3. Monitor usage — script для tracking v1 consumers.
  4. Communicate proactively — Slack, email, docs.
  5. Don’t accumulate versions — keep max 2 versions in prod at a time.
  6. Document version differences — schema.yml description должна explain what changed.
  7. Test both versions — both v1 and v2 should have tests.

Real-world example timeline

E-commerce company, fct_orders public model:

Day 0: Add is_subscription flag (additive). Don’t version. Update v1 schema with new column.

Day 60: Need to rename total_usd -> gross_revenue_usd + add net_revenue_usd (breaking). Plan v2.

Day 70: Deploy v2. Communicate to 8 consumer teams. Set deprecation_date Day 270 (6 months).

Day 80-200: 6 of 8 teams migrate. Easy migrate, hours of work each.

Day 200: Slow movers — 2 teams still on v1. Schedule 1:1, help migrate.

Day 250: All on v2. Deprecation date approaching.

Day 270: Deprecation date passes. Compile warnings appear.

Day 280: No more usage of v1 (verified via manifest scan).

Day 300: Remove v1 file, schema.yml entry, drop table. Communicate.

Total cycle — about 6-10 months for full migration. This is normal pace for ecosystem with multiple consumers.

Проверка знанийKnowledge check
Finance собирается deprecate v1 of fct_revenue. Из 8 consumer-проектов 7 migrate на v2, но Marketing застрял (large project, нет ресурсов). Deprecation date через 30 дней. Какие варианты у Finance?
ОтветAnswer
Это **classic conflict** в Mesh — producer хочет двигаться, consumer не успевает. Несколько вариантов: (1) **Extend deprecation date**: - Finance moves deprecation_date на +90 days (для Marketing). - Pro: Marketing получает время. Communication maintained. - Con: Finance carries v1 longer (storage cost, maintenance overhead). - Decision: если v1 cheap to maintain (просто table, no active development) — это nice gesture. Extend. (2) **Help Marketing migrate**: - Finance engineer pairs с Marketing engineer для PR. - Pro: Direct ownership over outcome. Builds relationships. - Con: Finance time investment. - Best when: Marketing просто overwhelmed, not technically blocked. 1-2 days of Finance time = 1-2 weeks of Marketing time saved. (3) **Provide migration tooling**: - Finance writes script that auto-translates v1 SQL refs to v2 (basic regex substitution): ```bash sed -i "s/ref('finance', 'fct_revenue', v=1)/ref('finance', 'fct_revenue', v=2)/g" models/**/*.sql ``` - Plus manual review (handle renamed columns). - Pro: Scales — same script works for any consumer. - Con: Doesn't cover all cases (e.g. column references inside SQL). (4) **Maintain v1 as "legacy"**: - Don't drop v1 после deprecation date. - Mark it deprecated permanently. - Accept storage cost as "tax for not enforcing migration". - Pro: No customer breakage. - Con: v1 maintenance going forward. If upstream changes — v1 might break, who fixes? (5) **Hard cutoff (NOT recommended unless agreed)**: - Drop v1 на deprecation date regardless. - Marketing breaks -> emergency PR. - Pro: Producer-side cleanliness. - Con: Builds resentment, breaks trust, business impact. - Only acceptable when consumer team explicitly said "we'll be ready" and didn't. **Recommended approach (combined)**: (a) **Communication**: Slack + meeting with Marketing lead. Understand specific blocker. "What do you need to migrate?" (b) **If blocker is real (e.g. complex schema migration)**: - Extend deprecation by 60-90 days. Explicit re-communication. - Offer Finance engineer pairing. - Set новую hard date (e.g. day 360 instead of 300). (c) **If blocker is priorities (lack of focus)**: - Escalate to leadership. Marketing data lead vs Finance data lead conversation. "We need to mark this Priority 1 for next sprint." - Sometimes this is needed для get buy-in. (d) **Avoid future situation**: - Lessons learned doc: "If we'd communicated 90 days before deprecation_date that we'd check progress, we'd catch this earlier." - Standard process: 60-day pre-deprecation check across all consumers. - Build "migration dashboard" — visual для leadership of which teams on which version. (e) **Technical safety net**: - Even if drop date approaches и Marketing not ready — последний день можно сделать "freeze" — keep table в warehouse, remove from schema.yml. Now refs are compile error, but data still exists. Marketing's old dbt manifests still work for runtime queries, but new runs need migrate. (f) **Long-term**: Consider **co-versioning policy** — Mesh-wide rule: deprecation periods must be 6+ months, with quarterly check-ins. This builds in resilience to "stuck consumer" scenarios. Conclusion: Hard deadlines в Mesh almost never работают. Real Mesh works on **collaborative deprecation** — communication, help, and flexibility. Treat it like API deprecation в external service: producer signaling, consumer migrating, both committed to timeline. Producer's job — not to force migration, но to **enable** it. If Marketing can't migrate, it's also a sign Producer didn't provide enough resources / tools / time.

Резюме

  • Versions — механизм для breaking changes на public моделях.
  • latest_version + versions — declarations в schema.yml.
  • Two physical filesfct_revenue_v1.sql, fct_revenue_v2.sql. Two tables в warehouse.
  • ref('project', 'model', v=N) — consumer pins to version.
  • deprecation_date — soft signal, dbt warns в compile.
  • Migration: producer creates v2 + deprecation для v1, consumers migrate в темпе.
  • Non-breaking changes — don’t version (additive columns).
  • Communication critical — Slack, docs, manifest tracking.
  • Avoid version accumulation — max 2 versions in prod.
  • Failure modes: contract violation без bump, stuck consumers, version drift, shared upstream diverge.

Проверьте понимание

Результат: 0 из 0
Концептуальный
Вопрос 1 из 6. Зачем нужны versioned models в Mesh?

Закончили урок?

Отметьте его как пройденный, чтобы отслеживать свой прогресс

Войдите чтобы оценить урок

Прогресс модуля
0 из 5