Learning Platform
Глоссарий Troubleshooting
Урок 11.05 · 22 мин
Средний
CI gatesContract enforcementSchema validationMetadata-onlyProduction checks

Contracts в CI: gate, что упускается, schema metadata limits

Enforced contract — это hard gate во время dbt run. Но contract можно использовать раньше, в CI на pull request, до того, как код попадает в main. Это даёт faster feedback loop: developer видит contract violation за 30 секунд CI run, не после полного scheduled run.

Этот урок — про operationalizing contracts: какие проверки делать в CI, какие сценарии contracts ловит, и что upravlyaет. Last part — limits — критически важна: contracts создают false sense of security если не понимать, что они не проверяют.

Airflow: CI для DAG-тестов — аналогичная многоуровневая схема

Contract как CI gate

Базовый CI workflow:

# .github/workflows/dbt-ci.yml
name: dbt CI
on: [pull_request]

jobs:
  contract-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }
      - run: pip install dbt-core==1.10.21 dbt-duckdb==1.10.1
      - run: dbt deps

      # Phase 1: Parse — статическая проверка
      - run: dbt parse

      # Phase 2: Compile — компиляция SQL без подключения к prod
      - run: dbt compile --target ci

      # Phase 3: Run на ephemeral DuckDB — проверяет contracts enforcement
      - run: dbt run --select state:modified+ --target ci

      # Phase 4: Tests — data quality после material
      - run: dbt test --select state:modified+ --target ci

Что catches каждая phase:

PhaseCatches
ParseYAML errors, syntax errors, missing refs
CompileJinja errors, undefined macros, missing sources
RunContract violations, SQL errors, materialization failures
TestsData quality issues (unique, not_null, accepted_values, custom)

Contract violations catches at Run phase — dbt attempts to material, fails при mismatch. Это требует actual DuckDB target в CI, не only parse.


CI gate types

Gate 1: Contract presence на marts

Проверяем, что все mart models имеют contract:

# scripts/check_contracts.py
import json, sys

with open('target/manifest.json') as f:
    manifest = json.load(f)

errors = []
for node_id, node in manifest['nodes'].items():
    if node['resource_type'] != 'model':
        continue

    is_mart = any(p == 'marts' for p in node['fqn'])
    if not is_mart:
        continue

    contract = node.get('contract', {})
    if not contract.get('enforced'):
        errors.append(f"Mart model '{node['name']}' missing enforced contract")

if errors:
    print('\n'.join(errors))
    sys.exit(1)

В CI:

- run: dbt parse
- run: python scripts/check_contracts.py

PR adding mart model без contract — fails.

Gate 2: data_type presence per column

Каждая column declared должна иметь data_type:

for node in manifest['nodes'].values():
    if node['resource_type'] == 'model':
        contract = node.get('contract', {})
        if contract.get('enforced'):
            for col_name, col in node.get('columns', {}).items():
                if not col.get('data_type'):
                    errors.append(f"{node['name']}.{col_name} missing data_type")

Контракт без data_type — incomplete. CI catches.

Gate 3: No bare types

numeric без precision, varchar без length — drift potential:

import re

BARE_TYPES = re.compile(r'^(numeric|number|varchar|int|float)$', re.IGNORECASE)

for node in manifest['nodes'].values():
    for col_name, col in node.get('columns', {}).items():
        data_type = col.get('data_type', '')
        if BARE_TYPES.match(data_type):
            errors.append(f"{node['name']}.{col_name}: bare type '{data_type}'")

Gate 4: Critical constraints на PK / FK / business-critical

CRITICAL_COLUMN_PATTERNS = ['_id$', '^id$', '_pk$', 'email', 'customer_id']

for node in manifest['nodes'].values():
    for col_name, col in node.get('columns', {}).items():
        constraints = col.get('constraints', [])
        constraint_types = {c['type'] for c in constraints}

        # PK columns should have unique + not_null
        if col_name.endswith('_id') or col_name == 'id':
            if 'primary_key' not in constraint_types:
                errors.append(f"{node['name']}.{col_name}: missing PRIMARY KEY")
            if 'not_null' not in constraint_types:
                errors.append(f"{node['name']}.{col_name}: missing NOT NULL")

Gate 5: Contracts + data tests parity

For each constraint, expect corresponding data test:

for node in manifest['nodes'].values():
    for col_name, col in node.get('columns', {}).items():
        constraint_types = {c['type'] for c in col.get('constraints', [])}
        tests = col.get('tests', [])
        test_names = set(t if isinstance(t, str) else list(t.keys())[0] for t in tests)

        # PK should have unique + not_null tests
        if 'primary_key' in constraint_types:
            if 'unique' not in test_names:
                errors.append(f"{node['name']}.{col_name}: PK but no unique test")
            if 'not_null' not in test_names:
                errors.append(f"{node['name']}.{col_name}: PK but no not_null test")

Это закрепляет правило: constraints declarative, tests enforce.

Gate 6: Versions consistency

For models with versions:

for node in manifest['nodes'].values():
    if node.get('versions'):
        if not node.get('latest_version'):
            errors.append(f"{node['name']}: has versions but no latest_version set")

        for v in node['versions']:
            if v['v'] > node['latest_version'] and not v.get('defined_in'):
                errors.append(f"{node['name']}.v{v['v']}: missing defined_in")

Gate 7: Deprecated versions not removed yet

Track соблюдение deprecation_date:

from datetime import date

for node in manifest['nodes'].values():
    for v in node.get('versions', []):
        dep_date = v.get('deprecation_date')
        if dep_date:
            if date.fromisoformat(dep_date) < date.today():
                warnings.append(f"{node['name']}.v{v['v']}: deprecation_date passed, should be removed")

Это warning, не error — depends on team policy.


Что contracts ловят: detailed

What contracts catch — examples

Critical: contracts vs data tests vs unit tests

AspectContractsData testsUnit tests
What checksSchema (columns, types, structural constraints)Data quality (values in warehouse)Logic correctness on mock input
When runsRun-time (build)Run-time (after build)Build-time / CI (no warehouse)
EnforcementBuild fails при mismatchTest fails если data invalidTest fails если logic wrong
StrengthStrong cross-warehouse (DDL applied)Strong (actual data query)Strong (deterministic)
WeaknessMetadata-only в Snowflake/BQSlow (warehouse query)Doesn’t catch real data drift
Production useMarts + public APIsAll models (basic), critical (extensive)Critical models (revenue, churn, attribution)

Все три слоя complementary. Production-grade модель — contract + data tests + unit tests.


Specific limits в production

1. Snowflake / BigQuery: constraints metadata-only

Что contracts declare:

constraints:
  - type: not_null
  - type: check
    expression: "revenue >= 0"

Что warehouse делает:

  • Snowflake: creates DDL CHECK (revenue >= 0)NOT enforced
  • BigQuery: similar — metadata only

Что НЕ ловит:

  • Insert NULL where NOT NULL declared
  • Insert negative revenue where CHECK declared
  • Insert duplicate where PK declared
  • Insert orphan FK где FOREIGN KEY declared

Real enforcement — через data tests:

data_tests:
  - not_null
  - dbt_utils.expression_is_true:
      expression: ">= 0"

2. DuckDB: full enforcement locally, partial cloud

Что works locally:

  • All constraints enforced
  • FK works

Что doesn’t work:

  • MotherDuck FK — not supported
  • :memory: — constraints lost когда process ends

3. Postgres: full enforcement, but cost

Что works:

  • All constraints enforced (PK, FK, CHECK, NOT NULL, UNIQUE)

Cost:

  • FK check on every INSERT — slows bulk loads
  • For dbt’s incremental loads — acceptable
  • For huge initial backfills — может быть bottleneck

4. Data quality несравнима со schema integrity

Contracts говорят: ‘schema is what we declared’. Doesn’t say ‘data is correct’.

Example:

- name: revenue
  data_type: numeric(12, 2)
  constraints:
    - type: not_null

Contract pass:

  • Column is numeric(12, 2) [x]
  • All values NOT NULL [x]

Data still broken:

  • All revenue values = 0 (formula bug)
  • Or all revenue values = $1 (currency error)
  • Or all revenue values from yesterday (freshness issue)

Contract: ok. Data quality: terrible. Need data tests + freshness checks + business validation.

5. Schema changes без notification

Contract заявляет ‘this is current schema’. Doesn’t notify consumers when schema changes.

Example:

  • Day 1: data_type: numeric(12, 2) — consumers use это
  • Day 2: PR changes data_type: numeric(18, 4) (more precision)
  • Day 2: PR merges
  • Consumers don’t see change

Solution: model versions (previous lesson) for breaking changes. Or notification process для non-breaking (added column, increased precision).


CI optimization: only modified

Не запускать contract checks на каждом PR на всех models — slow. Использовать state:modified+:

- name: Contract checks (modified only)
  run: |
    dbt run --select state:modified+ --target ci --defer
    python scripts/check_contracts.py --modified-only

state:modified+ — только изменённые модели + их downstream. Faster CI.

Это требует state comparison — manifest of production vs PR. См. модуль 13 (CI/CD GitHub).


Slim CI с contracts

Полный workflow:

# .github/workflows/dbt-ci.yml
on: pull_request

jobs:
  ci:
    steps:
      # Download production manifest for state:modified+
      - name: Download prod manifest
        run: aws s3 cp s3://my-bucket/dbt-manifest/manifest.json ./prod-manifest/

      # Parse
      - name: Parse
        run: dbt parse

      # Static checks
      - name: Custom contract checks
        run: python scripts/check_contracts.py

      # Slim CI build с contracts enforcement
      - name: Build modified
        run: dbt build --select state:modified+ --defer --state ./prod-manifest --target ci

      # Compile diff report
      - name: Schema diff report
        run: python scripts/schema_diff.py --before prod --after ci

dbt build = run + tests. With contracts, mismatch will fail build.

Schema diff report — extra niceness: показывает diff между prod schema и PR schema, для review.


Cost of contracts в production

Per-run overhead:

  • Constraint DDL применяется per relation
  • 50 marts × 20 columns × 3 constraints = 3000 DDL statements
  • На Snowflake — ~$0.01-0.05 per run в DDL credits
  • Negligible для most teams

Storage:

  • Constraints stored в metadata — minimal
  • Negligible

Compute:

  • Enforcement (where applies) — Postgres FK check может slow inserts
  • For dbt’s batch loads — acceptable

Development:

  • Time to write YAML declarations: 5-15 min per model first time
  • Time to maintain: 1-2 min per PR touching schema
  • ROI: orders of magnitude saved at downstream

Reverting contract

Случается: contract too restrictive, needs to disable temporarily.

- name: fct_orders
  config:
    contract:
      enforced: false   # toggle off

Single line change. dbt run pass without enforcement.

When to revert:

  • Production fire — need flexibility immediately
  • Migration in progress (between phases)
  • Discovered schema drift, need to align before re-enabling

After fix:

  • Re-enable, ensure CI passes
  • Document why was disabled (incident log)

CI gate: catch when contract disabled на critical mart:

CRITICAL_MARTS = ['fct_orders', 'customer_metrics', 'revenue_daily']

for node in manifest['nodes'].values():
    if node['name'] in CRITICAL_MARTS:
        if not node.get('contract', {}).get('enforced'):
            errors.append(f"CRITICAL: {node['name']} has contract.enforced=false")

Это strict policy — critical мarts must have contracts. Toggle off requires PR approval.


Попробуй сам

  1. Enable contract на одной mart-модели.

  2. Test breaking changes:

    • Изменить тип в SQL (numeric -> float) — dbt run должен fail
    • Add extra column в SQL — fail
    • Remove column from YAML — fail
  3. Write CI gate script:

    • Parse manifest.json
    • Check: all marts have enforced contract
    • Check: all columns have data_type
    • Check: no bare types (numeric, varchar)
    • Run в pre-commit или CI
  4. Test gate:

    • Add mart без contract — script should fail
    • Add column без data_type — script should fail
  5. Combine:

    • Contract + data test (not_null + unique) + unit test (logic) — на одной модели
    • All three layers active

Ключевые выводы

  1. Contract как CI gate: dbt run with contract enforcement = build fails при mismatch SQL vs YAML. Faster feedback than scheduled prod run.
  2. CI Gates types: presence on marts, data_type per column, no bare types, critical constraints, contract+test parity, version consistency, deprecation enforcement.
  3. What catches: type drift, extra column, missing column, precision change. What doesn’t catch: wrong data, wrong logic, semantic change, performance regression.
  4. Metadata-only constraints: Snowflake/BigQuery — CHECK/FK/NOT NULL declarative, не enforced. DuckDB locally — enforced. Postgres — enforced.
  5. Complement with data tests: constraints metadata + data tests = actual enforcement. Belt and suspenders.
  6. CI optimization: state:modified+ для только изменённых models. Slim CI.
  7. Cost: minimal compute / storage. Mostly development time для YAML declarations. ROI огромный для production stability.
  8. Reverting: enforced: false toggle для emergency. CI gate должен catch reversion на critical marts.
Проверка знанийKnowledge check
Команда добавила contract в YAML. dbt run pass, but production dashboards показывают неправильные numbers. Что упустили?
ОтветAnswer
Contract ловит **schema**, не **data quality** или **logic**. Three categories что не ловит:\n\n**1. Wrong data (data quality issue)**:\n\n```yaml\nconstraints:\n - type: not_null\n - type: check\n expression: "revenue не меньше 0"\n```\n\nНа Snowflake/BigQuery — **declarative, не enforced**. NULL и negative values pass через INSERT.\n\n**Solution** — data tests:\n```yaml\ndata_tests:\n - not_null\n - dbt_utils.expression_is_true:\n expression: "не меньше 0"\n```\n\n**2. Wrong logic (formula bug)**:\n\nSQL formula неправильная: `SUM(amount * discount)` вместо `SUM(amount * (1 - discount))`. Schema same — column `revenue` numeric(12,2). Contracts pass.\n\nDashboard shows revenue 10x too high. Catastrophic.\n\n**Solution** — unit tests:\n```yaml\nunit_tests:\n - name: revenue_formula\n given:\n - input: ref('orders')\n rows: [{amount: 100, discount: 0.1}]\n expect:\n rows: [{revenue: 90}] # not 10!\n```\n\n**3. Semantic change**:\n\nDay 1: `order_total` in USD\nDay 2: Producer changes to EUR (without notification)\n\nSchema same — `numeric(12, 2)`. Contracts pass. Dashboards aggregating USD теперь sum EUR — misleading numbers.\n\n**Solution** — model versions для breaking changes (lesson 03-04):\n```yaml\nversions:\n - v: 1\n columns: [{name: order_total, ...}] # USD\n - v: 2\n columns: [{name: order_total_eur, ...}] # explicit naming\n```\n\nИли — добавить currency column:\n```sql\nSELECT order_total, 'USD' AS currency FROM ...\n```\n\nForce consumers to think about units.\n\n**4. Performance regression**:\n\nProducer optimizes SQL, accidentally adds bad JOIN. Build runs 10x slower. Contracts pass — schema same.\n\n**Solution** — operational metrics:\n- Monitor `dbt run` duration\n- Alert on regression\n- Performance tests (separate concern)\n\n**Summary**: contract = schema integrity. Doesn't catch:\n- [X] Wrong values (data tests catch)\n- [X] Wrong formulas (unit tests catch)\n- [X] Wrong semantics (versions force visibility)\n- [X] Performance (operational monitoring)\n\n**Production-grade модель** — все 4 слоя:\n1. Contract (schema)\n2. Data tests (quality)\n3. Unit tests (logic)\n4. Operational metrics (performance, freshness)\n\nEach layer обращена differs problem. None replaces another.
Проверка знанийKnowledge check
Senior спрашивает: 'если на Snowflake constraints metadata-only — зачем вообще объявлять их в contract?'. Защитите decision.
ОтветAnswer
Constraints в contract имеют **3 use cases** даже когда metadata-only:\n\n**1. Documentation / Self-documentation**:\n\nContract YAML — это **schema-as-code**, читаемая form for downstream. Без contract — readers смотрят SQL, гадают про PK, FK, NULLability.\n\nС contract:\n```yaml\n- name: customer_id\n data_type: bigint\n constraints:\n - type: primary_key\n - type: not_null\n```\n\nВ `dbt docs` UI shows constraints visually. Consumers видят 'this is a PK' без grep SQL. Data catalogs (DataHub, Atlan) read warehouse metadata + dbt manifest — display constraints.\n\n**2. Optimizer hints**:\n\nSnowflake / BigQuery optimizers **используют** declared constraints для query planning:\n\n- **PK declared**: optimizer may skip distinct elimination — assumes uniqueness\n- **FK declared**: optimizer may skip joins / use specialized join strategies\n- **CHECK declared**: optimizer may eliminate predicates if statically satisfied\n\nReality: usage depends on warehouse and version. Snowflake honors PK для some optimizations. BigQuery — partial support. But it's free performance boost when applicable.\n\n**3. Schema compatibility cross-warehouse**:\n\nProject might migrate Snowflake -> DuckDB (или vice versa). Contracts written once — apply across warehouses. Snowflake (metadata-only) -> DuckDB (enforced):\n- Same YAML schema declarations\n- Different enforcement levels\n- Same documentation\n\nWithout contracts — schema implicit in SQL casts. Migration painful.\n\n**4. Future warehouse upgrades**:\n\nSnowflake / BigQuery могут add real enforcement в future versions. Your contracts already declared — get enforcement gratis when warehouse upgrades.\n\n**5. Internal consistency check**:\n\nEven though Snowflake doesn't enforce CHECK at INSERT, you can periodically run:\n\n```sql\nSELECT * FROM customer_metrics\nWHERE NOT (revenue не меньше 0) -- the CHECK expression\n```\n\nIf non-empty — data quality issue. Constraints provide expressions to check (more discoverable than scattered data tests).\n\n**Real-world weight**:\n\n- **#1 (Documentation)** — 70% of value. Self-documenting schema saves countless hours.\n- **#2 (Optimizer)** — 10-20% (warehouse-dependent).\n- **#3-4 (Portability/Future)** — 5-10%.\n- **#5 (Periodic check)** — 5-10% (можно делать вручную, but explicit better).\n\n**Counter-argument** — when NOT to declare:\n\n- For exhaustive constraint list (every column NOT NULL, every relationship FK): overhead vs payoff diminishing.\n- For internal staging: documentation less important.\n- For one-off analyses: throwaway.\n\n**Production stance**:\n- Marts (consumer-facing): full contracts including constraints\n- Public APIs: comprehensive contracts\n- Staging/intermediate: type contracts only, skip constraints\n\n**Bottom line**: constraints are **declarative documentation + optimization hint**, even when warehouse doesn't enforce at INSERT level. For schema integrity at insert time — data tests `unique`, `not_null`, `relationships`, `expression_is_true`. Both layers complement.

Проверьте понимание

Результат: 0 из 0
Аналитический
Вопрос 1 из 5. Команда добавила contract в YAML. dbt run pass, but production dashboards показывают неправильные numbers. Что упустили?

Закончили урок?

Отметьте его как пройденный, чтобы отслеживать свой прогресс

Войдите чтобы оценить урок

Прогресс модуля
0 из 5