Exposures как контракт с downstream-консьюмерами

В dbt I мы делали базовое знакомство с exposures: YAML с type, url, owner, depends_on, и видели их в lineage graph. На middle-уровне exposures становятся контрактом: команда производителей данных (data team) и команды-консумеры (BI / ML / engineering) договариваются о том, какие модели читаются, кем, и как должны меняться.

Этот урок — про production использование exposures: как организовать impact analysis, как обновлять exposures при изменениях, как использовать exposures для CI selective runs, и общие паттерны для разных типов consumers.

Exposure как контракт

Когда модель fct_orders имеет exposure monthly_revenue_dashboard, это договорённость:

Producer (data team): «мы поддерживаем fct_orders с этими колонками. При breaking change мы уведомляем владельца dashboard.»
Consumer (analytics team): «мы используем эти колонки fct_orders в dashboard. При нашей deprecation отзываем exposure из YAML.»

Без формализованных exposures это договорённость в голове у людей. Кто-то ушёл из команды — exposure забыт — breaking change на dashboard выходит в проду без знания.

С exposures:

При изменении fct_orders dbt list --select fct_orders+1 показывает все downstream exposures.
Owner exposure прописан -> можно автоматически нотифицировать.
CI gate: dbt build --select state:modified+ запускает downstream, проверяет tests.

Полный синтаксис exposure

# models/_exposures.yml
version: 2

exposures:
  - name: monthly_revenue_dashboard
    label: "Monthly Revenue Dashboard"  # human-friendly название для UI
    type: dashboard
    maturity: high  # high | medium | low
    url: https://tableau.company.com/dashboards/monthly-revenue
    description: |
      Executive dashboard с месячной выручкой по сегментам, странам, каналам.
      Используется CFO / VP Sales для quarterly business review.

      **Refresh**: ежедневно 06:30 UTC после dbt run.
      **Critical**: yes — sla 99.5% uptime.

    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')
      - ref('customer_segments')

    owner:
      name: Alice Johnson
      email: [email protected]

    meta:
      slack_channel: "#analytics-revenue-dashboard"
      tableau_workbook_id: "wb_xyz_123"
      sla: "99.5% uptime, daily refresh by 07:00 UTC"

Поля:

Поле	Значение
`name`	Уникальное имя exposure в проекте.
`label`	Human-friendly название для UI (опционально).
`type`	`dashboard` / `notebook` / `analysis` / `ml` / `application`.
`maturity`	`high` / `medium` / `low` — приоритет для impact analysis.
`url`	Ссылка на actual ресурс.
`description`	Production tone (см. модуль 09/01).
`depends_on`	Список refs / sources.
`owner`	`{name, email}` — обязательно.
`meta`	Кастомные key-value: Slack channel, internal IDs, SLA.

Типы exposures: когда какой

Тип	Когда использовать	Examples
`dashboard`	BI-дашборд: Tableau, Looker, Mode, Metabase, Superset.	”Executive Revenue Dashboard”
`notebook`	Jupyter / Hex / Databricks notebook для exploration.	”Q4 Cohort Analysis Notebook”
`analysis`	One-off analysis в Notion / Google Doc.	”Customer Churn Deep Dive 2026 Q1”
`ml`	ML модель в production или ML feature store.	”Churn Prediction v3”
`application`	Любое приложение / сервис / API, читающее warehouse.	”Internal Admin Tool”, “Customer Email Pipeline”

Type — это labelling для UI и для node-selection (dbt list --select exposure:* --resource-type exposure). Logic обработки одинакова. Tipe помогает при impact analysis: «эти 5 exposures — dashboards, 2 — ML, 1 — приложение -> разные SLA, разные процессы deprecation».

Maturity: high vs medium vs low

maturity отражает прайority exposure:

high — критический dashboard / production ML. SLA strict. Breaking change требует migration plan + announcement.
medium — important но не business-critical. Breaking change -> 1 week notice.
low — exploration / one-off / experimental. Breaking change -> 1 day notice или silent change (с CI tests).

При impact analysis обычно фокус на high first: dbt list --select exposure:* --indirect-selection=cautious | grep "maturity.*high".

В UI dbt docs exposures с разной maturity могут отображаться разными цветами (depending on theme). Это nudge для consumers и producers — что важнее.

depends_on: точный список

depends_on:
  - ref('fct_orders')
  - ref('dim_customers')
  - ref('customer_segments')
  - source('finance_app', 'gl_entries')  # source тоже можно

Что важно:

Точный, не «всё подряд»: указывай только то, что реально читается dashboard / ML. Если consumer не использует dim_customers, не пиши его в depends_on (увеличит false positive в impact analysis).
Granularity модели, не сегментов: depends_on на уровне модели, не на уровне filtered subset.
Source as exposure dependency: можно ссылаться на source, не только ref. Полезно для dashboards, читающих staging tables напрямую (но обычно лучше через mart).
Versioned ref: если модель имеет versions (модуль 10), exposure может pinning конкретную версию:
```
depends_on:
  - ref('fct_orders', v=2)
```

Использование exposures в CI и scheduled runs

Самое production-relevant использование — selective builds для consumers.

Per-consumer scheduled run

Вместо dbt build всё подряд раз в день, можно делать:

# 06:00 UTC — выручка дашборд (high priority, early refresh)
dbt build --select +exposure:monthly_revenue_dashboard

# 07:00 UTC — ML модель (medium priority)
dbt build --select +exposure:churn_prediction_v3

# 08:00 UTC — внутренние notebooks (low priority)
dbt build --select +exposure:cohort_analysis_q1

+exposure:name — все upstream моделей, на которые ссылается exposure. Это минимальный subset для refresh dashboard.

Преимущества:

Critical consumers refresh first (priority).
Не пересчитываем неиспользуемые модели.
При failure of one — другие consumers не блокируются.

Slim CI с exposures

В CI можно делать selective build для только изменённых exposure dependencies:

# CI: запустить тесты только тех моделей, от которых зависят exposures + что изменилось
dbt build --select state:modified+ --select exposure:*+

Это полезно если основной фокус CI — не сломать downstream consumers. Можно даже фильтровать по maturity:

# Только high-maturity exposures
dbt build --select +exposure:* --indirect-selection=cautious
# тогда filter в python скрипте: maturity == 'high'

Impact analysis: что review при изменении upstream

Data Lineage и Impact Analysis — governance-программа Data Contracts — formal specification downstream dependencies

Impact analysis при изменении модели

Изменение

Список impacted

Owner notification

Safe merge

Конкретный пример. Команда хочет переименовать fct_orders.order_amount -> fct_orders.order_total. Шаги:

# 1. Найти impacted exposures
dbt list --select fct_orders+1 --resource-type exposure
# Output:
#   exposure.my_project.monthly_revenue_dashboard
#   exposure.my_project.churn_prediction_v3
#   exposure.my_project.weekly_marketing_report

# 2. Узнать owners
dbt list --select fct_orders+1 --resource-type exposure --output json | jq '.[] | {name, owner, maturity}'
# Output:
# [
#   {name: "monthly_revenue_dashboard", owner: {email: "alice@..."}, maturity: "high"},
#   {name: "churn_prediction_v3", owner: {email: "bob@..."}, maturity: "high"},
#   {name: "weekly_marketing_report", owner: {email: "charlie@..."}, maturity: "medium"}
# ]

# 3. Связаться с Alice, Bob, Charlie до merge
# 4. После confirmation — merge + announcement в #data-channel

Без exposures: производители не знают, кто read fct_orders. Колонка переименована — на следующий день Alice звонит «dashboard broken». Это organizational blind spot, который exposures убирают.

Поддержка exposures живыми

Та же проблема, что с descriptions: exposures устаревают. Dashboard удалён, ML модель уведена в архив — exposure не убран. Лекарства:

1. Quarterly review

Раз в квартал прогон:

# Все exposures отсортировать по последнему update
dbt list --resource-type exposure --output json | jq '.[] | {name, owner, maturity}'

Для каждого:

Owner всё ещё работает в команде?
URL всё ещё работает (не 404)?
Dashboard / ML модель всё ещё используется?

Не используется -> удалить exposure. Используется но owner ушёл -> переназначить.

2. URL health check

CI gate:

# scripts/check_exposures_urls.py
import requests, json

with open('target/manifest.json') as f:
    manifest = json.load(f)

errors = []
for exp in manifest.get('exposures', {}).values():
    url = exp.get('url')
    if url:
        try:
            r = requests.head(url, timeout=10, allow_redirects=True)
            if r.status_code >= 400:
                errors.append(f"{exp['name']}: {url} returned {r.status_code}")
        except Exception as e:
            errors.append(f"{exp['name']}: {url} unreachable - {e}")

if errors:
    print('\n'.join(errors))
    exit(1)

При красном — узнавать у owner: dashboard переехал, или это dead exposure?

3. Owner-as-active check

Pre-merge hook:

# Проверяем, что owner.email в каждом exposure — текущий сотрудник
python scripts/check_owners_active.py --exposures models/_exposures.yml --hr-api ...

Это требует интеграции с HR/Active Directory. Не у всех команд есть, но полезно для крупных организаций.

Антипаттерны

Exposure без owner: «owner: data team». Не actionable при breaking change — никто конкретный не отвечает. Always — name + email.
Exposure с устаревшим URL: dashboard переехал на новый URL, exposure ссылается на old. Confused аналитики. CI gate должен ловить (URL health check).
Exposure для temporary / one-off analysis: создал exposure для personal Notion doc, ушёл из команды, exposure висит forever. Использовать type: analysis + maturity: low если нужно, но лучше — не создавать exposure для ephemeral things.
Slishком много exposures без maturity: 200 exposures, все high или без maturity. Impact analysis бессмыслен — всё «critical». Differentiating maturity — обязательно.
Exposure с broad depends_on: «зависит от всех моделей» — теряет значение для impact analysis. Указывай точно то, что реально читается.
No CI gate для new exposures: PR с новым exposure без description / owner / URL — мержится тихо. CI должен проверять completeness.
Exposure как кладбище старых dashboards: removed Tableau workbooks не убраны из exposures.yml. Quarterly review необходим.

Production-grade организация exposures

Большой проект имеет 50-200 exposures. Один _exposures.yml становится 5000+ строк. Лучшая практика — per-team или per-domain:

models/
  _exposures/
    _finance__exposures.yml         # exposures от finance team
    _marketing__exposures.yml       # exposures от marketing team
    _ml__exposures.yml              # ML team exposures
    _eng__exposures.yml             # application exposures

Каждая команда поддерживает свой YAML. Один лиmiterной — granularity ownership.

dbt автоматически подхватывает все YAML в models/. Не нужен сложный config.

Пример dashboard exposure

Реальная dashboard exposure для production:

- name: monthly_executive_revenue_dashboard
  label: "Monthly Executive Revenue Dashboard"
  type: dashboard
  maturity: high
  url: https://tableau.company.com/dashboards/exec-revenue
  description: |
    **Audience**: CFO, VP Sales, VP Marketing (quarterly business review).

    **Покажет**:
    - Monthly revenue по сегментам (VIP / Premium / Standard)
    - Top 10 продуктов по выручке
    - Geographical breakdown (US / EU / APAC)
    - YoY trend по quarter

    **Refresh**: ежедневно 06:30 UTC после dbt run.

    **SLA**: 99.5% availability, data freshness < 24h.

    **Critical**: yes — used in board meetings.

  depends_on:
    - ref('fct_orders')
    - ref('dim_customers')
    - ref('dim_products')
    - ref('customer_segments')
    - ref('geo_dim')

  owner:
    name: Alice Johnson
    email: [email protected]

  meta:
    slack_channel: "#analytics-revenue-dashboard"
    tableau_workbook_id: "wb_exec_revenue_v3"
    page_owner: "alice.johnson@..."
    sla: "99.5% uptime, daily refresh by 07:00 UTC"
    business_metric: "revenue"
    tier: "p0"

ML exposure пример

- name: churn_prediction_model_v3
  label: "Churn Prediction Model v3"
  type: ml
  maturity: high
  url: https://mlflow.company.com/models/churn_prediction/v3
  description: |
    **Use case**: предсказание churn (90-day inactivity) для proactive outreach.

    **Inputs**: customer features (recency, frequency, monetary, support tickets).
    **Output**: churn probability 0-1 + reason codes.

    **Trained**: 2026-03-15. **Retrained**: ежемесячно.

    **Production**: serving via MLflow + Kubernetes.

  depends_on:
    - ref('customer_metrics')
    - ref('customer_orders_summary')
    - ref('support_tickets_summary')

  owner:
    name: Bob Lee
    email: [email protected]

  meta:
    ml_team: "Customer Success ML"
    mlflow_model_id: "churn_v3"
    sla: "monthly retrain by 1st"

Попробуй сам

В твоём dbt-проекте:

Identify downstream consumers: какие dashboards / ML / приложения читают твои marts?
Создать exposures: 5-10 exposures для main consumers. Включить все обязательные поля (type, owner, url, depends_on, description).
Различать maturity: critical = high, important = medium, exploratory = low.

Try impact analysis:

dbt list --select `<one_critical_mart>`+1 --resource-type exposure

Try per-consumer build:

dbt build --select +exposure:`<critical_dashboard>`

CI integration: добавь grep по target/manifest.json на presence of owner + URL для new exposures.

Ключевые выводы

Exposure — контракт между data producer и consumer. Без exposures impact analysis невозможен — breaking changes выходят в проду незамеченно.
Полный exposure: name, type, maturity, url, description (production tone), depends_on (точный список), owner (name+email), meta (Slack channel, IDs, SLA).
Types (dashboard, notebook, analysis, ml, application) — labelling, помогают при impact analysis и selective runs.
Maturity (high/medium/low) — приоритет для impact analysis. Сосредоточиться на high first.
Production usage: per-consumer scheduled runs (dbt build --select +exposure:dashboard_x), Slim CI с exposure filter, impact analysis при upstream change.
Поддержка живыми: quarterly review, URL health check (CI), owner-as-active validation.
Антипаттерны: no owner, dead URL, no maturity differentiation, broad depends_on, exposure-for-ephemeral, no CI gate.
Большой проект: разбивать по _exposures/_<team>__exposures.yml. Per-team ownership.

Проверка знанийKnowledge check

Команда планирует rename колонки `fct_orders.order_amount -> fct_orders.order_total`. Они спрашивают: 'как узнать, кто это использует?'. Опиши шаги impact analysis с exposures.

ОтветAnswer

Без exposures — невозможно узнать formal. С exposures — пять шагов:\n\n**1. Find impacted exposures**:\n\n```bash\ndbt list --select fct_orders+ --resource-type exposure --output json | jq '.[] | {name, owner, maturity, depends_on}'\n```\n\n`+` после fct_orders — все downstream nodes (transitive). `--resource-type exposure` — только exposures, не models.\n\n**2. Sort by maturity**:\n- high (critical) — сначала\n- medium (important) — потом\n- low (exploratory) — последние\n\nКаждой команде need different communication timeline.\n\n**3. Notify owners**:\n\nДля каждого exposure:\n\n```\nTo: alice.johnson@..., bob.lee@..., charlie@...\nSubject: [Breaking change] fct_orders.order_amount -> order_total\n\nКоманда data планирует rename fct_orders.order_amount -> fct_orders.order_total на 2026-06-01.\n\nЗатронутые exposures:\n- monthly_executive_revenue_dashboard (Alice, maturity: high)\n- churn_prediction_v3 (Bob, maturity: high)\n- weekly_marketing_report (Charlie, maturity: medium)\n\nПросим:\n1. Подтвердить, что вы используете order_amount\n2. Если да — provide list places (SQL/Tableau worksheet/Python script)\n3. Plan migration window до 2026-05-25\n\nЕсли не используете — confirm и закроем communication.\n```\n\n**4. Implementation plan**:\n\n- Phase 1 (2026-05-15): Add new column `order_total` as alias of `order_amount` в fct_orders.\n- Phase 2 (2026-05-20): Уведомить consumers, что новая колонка доступна.\n- Phase 3 (2026-06-01): Drop `order_amount` после confirmation, что все migrated.\n\nЭто **safe migration**, не breaking. Альтернатива (модуль 10) — model versions.\n\n**5. Post-merge announcement**:\n\nПубликация в #data-channel: 'fct_orders.order_amount renamed to order_total. Deprecated column removed.' + ссылка на новый docs.\n\nЭто стандарт production data engineering — **никаких surprises** для downstream.

Проверка знанийKnowledge check

Команда обнаружила: в проекте 200 exposures, после quarterly review половина dead (URL 404, owner ушёл из компании, dashboard переименован). Стратегия cleanup?

ОтветAnswer

200 exposures × 50% dead = 100 mert exposures, засоряют impact analysis и dbt docs. Cleanup-стратегия:\n\n**Шаг 1 — Audit URL health**:\n\nCI script проверяет каждый URL:\n\n```python\nfor exp in manifest['exposures']:\n try:\n r = requests.head(exp['url'], timeout=10)\n if r.status_code не меньше 400:\n print(f"DEAD: {exp['name']} - {r.status_code}")\n except:\n print(f"UNREACHABLE: {exp['name']}")\n```\n\nВыдаёт список dead URLs.\n\n**Шаг 2 — Audit owner emails**:\n\nИнтеграция с HR / Active Directory:\n\n```python\nfor exp in manifest['exposures']:\n if not is_active_employee(exp['owner']['email']):\n print(f"ORPHANED: {exp['name']} - owner {exp['owner']['email']} not active")\n```\n\n**Шаг 3 — Categorize**:\n\n| Категория | Действие |\n|-----------|----------|\n| URL 404 + owner ушёл | DELETE (точно dead) |\n| URL 404, owner active | ASK owner — переехало или умерло? |\n| URL ok, owner ушёл | REASSIGN — find new owner или DELETE |\n| URL ok, owner active, no recent updates | KEEP, mark low maturity |\n\n**Шаг 4 — Cleanup PR**:\n\n- Удалить confirmed dead exposures из YAML (50 штук)\n- Reassign orphaned (20 штук) после ask\n- Update URLs где dashboard переехал\n- Reduce maturity для not-actively-used\n\n**Шаг 5 — Process improvements** (prevent recurrence):\n\n1. **Quarterly review** — добавить в team calendar. Один человек делает audit, разносит actions.\n2. **CI gate**: URL health check на каждом PR — caught dead URLs early.\n3. **Owner reassignment workflow**: когда сотрудник уходит, HR notifies data team — exposures с этим owner — reassign до offboarding.\n4. **Exposure expiration**: optional `meta.review_by: '2027-01-01'` — после даты CI mark stale если не updated.\n\n**Quantitative goal**: на 200 exposures должно быть менее 5% dead в любой момент. Quarterly cleanup делает это sustainable.\n\nЭкономия: чистые exposures = точный impact analysis. Аналитики и engineers получают actionable list, не noise.

Exposures как контракт с downstream-консьюмерами

Exposure как контракт

Полный синтаксис exposure

Типы exposures: когда какой

Maturity: high vs medium vs low

depends_on: точный список

Использование exposures в CI и scheduled runs

Per-consumer scheduled run

Slim CI с exposures

Impact analysis: что review при изменении upstream

Поддержка exposures живыми

1. Quarterly review

2. URL health check

3. Owner-as-active check

Антипаттерны

Production-grade организация exposures

Пример dashboard exposure

ML exposure пример

Попробуй сам

Ключевые выводы

Закончили урок?