Learning Platform
Глоссарий Troubleshooting
Урок 12.05 · 22 мин
Средний
MaterializationConditional configMulti-environmentPerformance

Conditional materialization: table в prod, view в dev

В предыдущих уроках мы прошли target.name, env_var, generate_schema_name, grants. Этот урок — про materialization через target conditional: как использовать view в dev (быстрый rebuild), table в prod (query performance), incremental только в prod (избежать complexity в dev).

Хорошее multi-env materialization config экономит часы developer time (faster iteration в dev) и optimizes production query performance. Это last but critical piece of multi-env puzzle.

dbt-iii: анатомия materialization — как работает table vs view vs incremental

Базовый pattern: table prod, view dev

# dbt_project.yml
models:
  my_project:
    +materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"

Что happens:

  • Dev / CI (target.name != 'prod'): all models materialized as views
  • Prod: all materialized as tables

Or per-layer:

models:
  my_project:
    staging:
      +materialized: view   # views везде (light layer)

    intermediate:
      +materialized: "{{ 'ephemeral' if target.name != 'prod' else 'view' }}"
      # CTE inlining в dev, real views в prod

    marts:
      +materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"

Why this pattern

Materialization tradeoffs

Math example:

Model takes 10 minutes для full rebuild.

Dev (view):

  • dbt run --select my_model = compile + create view = ~5 seconds
  • Tradeoff: each ref query reruns the SQL — slow for analytical queries
  • BUT — dev usually uses sample data -> small impact

Prod (table):

  • dbt run --select my_model = full rebuild = 10 minutes
  • Read queries fast (just SELECT from table)
  • Daily scheduled run amortizes cost

Prod (incremental):

  • Daily run = INSERT new rows = 1 minute (10x faster than full)
  • Initial backfill = 10 minutes (one-time)

Conditional incremental только в prod

Incremental adds complexity:

  • is_incremental() macro logic
  • Need to handle --full-refresh properly
  • Test setup more involved
  • In dev: backfilling each run = ужас

Solution — incremental только в prod, table в dev:

- name: large_fact_table
  config:
    materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"
    unique_key: order_id
    incremental_strategy: 'merge'
SELECT
    order_id,
    customer_id,
    order_total
FROM {{ ref('stg_orders') }}

{% if is_incremental() and target.name == 'prod' %}
  WHERE order_date >= (SELECT MAX(order_date) FROM {{ this }}) - INTERVAL '3 day'
{% endif %}

In dev: target.name=‘dev’, materialized=‘table’, is_incremental() evaluates to false (table is new on every run). Full SQL runs. Fast for small dev data.

In prod: materialized=‘incremental’, is_incremental() true after first run. Only new orders processed. Fast.


Conditional unique_key / partition / cluster

Sometimes config differs:

- name: orders_partitioned
  config:
    materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"
    incremental_strategy: 'merge'
    unique_key: order_id

    # Snowflake / BigQuery: partition by date в prod (faster queries)
    cluster_by: "{{ ['order_date', 'customer_id'] if target.name == 'prod' else none }}"

In dev — no clustering (overhead не оправдан на маленьких данных). In prod — cluster для query performance.


Ephemeral materialization

materialized: ephemeral — model not physically materialized. Just CTE inlined в downstream models.

- name: int_user_demographics
  config:
    materialized: "{{ 'ephemeral' if target.name != 'prod' else 'view' }}"

Dev: ephemeral — no DDL, no physical object. SQL copy-pasted in downstream as CTE. Faster iteration.

Prod: view — real DDL. Downstream uses standard FROM int_user_demographics. Standard production behavior.

Use case: intermediate models in dev pipelines где не нужны permanent objects. Saves cleanup, faster runs.

Limits:

  • Ephemeral models cannot have data tests (no table to query)
  • Cannot have grants
  • Cannot ref from non-dbt code

—target overrides

Sometimes need different behavior. Use --target flag:

# Default: dev config (views)
dbt run

# Specific target: prod config (tables)
dbt run --target prod

# Override default vars
dbt run --target prod --vars '{full_refresh: true}'

In dbt_project.yml conditional logic auto-applies based on target.

For one-off overrides — set env_var и use в conditional:

+materialized: >
  {{
    'table' if env_var('FORCE_TABLE', 'false') | as_bool
    else 'table' if target.name == 'prod'
    else 'view'
  }}
# Force tables in dev (rare)
FORCE_TABLE=true dbt run --target dev

Per-model conditional materialization

Some models need different patterns:

1. Large fact table — always incremental в prod:

- name: large_fact_table
  config:
    materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"
    unique_key: 'id'
    incremental_strategy: 'merge'

2. Snapshot — always table (snapshots have their own materialization, but tables similar):

- name: customer_snapshot
  config:
    materialized: table   # always — snapshots need persistence

3. Aggregates — always view (small, cheap to recompute):

- name: daily_revenue_summary
  config:
    materialized: view   # small summary, view is fine even prod

4. ML feature tables — always table:

- name: customer_features
  config:
    materialized: table   # ML pipelines need stable storage

Mix and match based on model purpose, not blanket policy.


Полный example

# dbt_project.yml
name: my_project
version: '1.0.0'

models:
  my_project:
    # Default: table prod, view dev
    +materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"

    staging:
      +schema: staging
      +materialized: view   # always view (light layer)

    intermediate:
      +schema: intermediate
      +materialized: "{{ 'ephemeral' if target.name == 'dev' else 'view' }}"
      # ephemeral в dev для speed, view в prod для observability

    marts:
      +schema: marts
      +materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"

      # Specific overrides
      fct_orders:
        +materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"
        +unique_key: 'order_id'
        +incremental_strategy: 'merge'

      daily_revenue:
        +materialized: view   # always view (small aggregate)

Use:

dbt run --target dev      # views, ephemeral intermediate
dbt run --target ci       # views (CI = dev pattern)
dbt run --target prod     # tables, incremental for fct_orders, view for daily_revenue

Performance impact

Real numbers (example):

Dev cycle (developer iterating on model):

  • View pattern: 5-30 sec per dbt run (just compile + DDL)
  • Table pattern: 5-20 min per dbt run (full materialize)
  • Savings: 80-95% per iteration

For developer making 20-50 changes per day, view pattern saves hours.

Production runs:

  • Tables: stable, fast reads, predictable cost
  • Incremental: 10x faster runs for huge tables (TB scale)
  • Views в prod: slower reads (DDL re-evaluated each query), but no storage cost

BI tool query performance (consuming models):

  • Table: sub-second to 10s typically
  • View: 10s to minutes (depends on complexity)
  • For dashboards refreshed often, tables crucial

Anti-patterns

1. Same materialization everywhere

+materialized: table

In dev — slow iteration. In prod — sometimes view enough (small aggregates).

Use target-conditional.

2. Incremental in dev

- name: orders
  config:
    materialized: incremental
    unique_key: order_id

Without target conditional. In dev: pain — incremental logic complexity for no benefit (small data).

Use:

materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"

3. View materialization для широкой mart

- name: revenue_dashboard_mart
  config:
    materialized: view   # always

BI tool query — every refresh recomputes view (joins, aggregates). Slow dashboards.

Use table или incremental в prod.

4. Tables for trivial transformations

- name: stg_users
  config:
    materialized: table   # SELECT * FROM raw with renames

No benefit. View suffices. Save storage.

5. Hardcoded test config

- name: my_model
  config:
    materialized: view
    full_refresh: false

full_refresh: false — never accept —full-refresh. If model becomes incremental в future, refresh impossible. Be careful.

6. Conditional based on user input

materialized: "{{ 'table' if var('use_tables') else 'view' }}"

Confusing — config depends on runtime vars. Hard to reason. Use target.name (stable).

7. Different materialization in CI vs prod

materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"

CI uses table — tests pass. Prod uses incremental — может expose incremental-specific bugs not seen in CI.

Better: CI mimics prod materialization:

materialized: "{{ 'incremental' if target.name in ('prod', 'ci') else 'table' }}"

Or have separate CI target that mirrors prod config.


CI gate для materialization

NOTE

Скрипт ниже — иллюстрация, как target-aware materialization-config можно проверить через manifest. Полная механика CI gates, prod manifest, state:modified+ — в модуле 13-ci-cd-github.

# scripts/check_materialization.py
import json

with open('target/manifest.json') as f:
    manifest = json.load(f)

errors = []
for node in manifest['nodes'].values():
    if node['resource_type'] != 'model':
        continue

    fqn = node['fqn']
    materialized = node['config'].get('materialized', '')

    # Marts should not be views in prod (assumes building prod manifest)
    is_mart = any(p == 'marts' for p in fqn)
    if is_mart and materialized == 'view':
        # check if conditional
        # In manifest, it's resolved value — so 'view' means prod produces views
        errors.append(f'Mart {node[\"name\"]} is view in this target — performance concern')

Hint: build manifest for --target prod specifically.


Попробуй сам

  1. Set conditional materialization:

    models:
      my_project:
        +materialized: "`{{ 'table' if target.name == 'prod' else 'view' }}`"
  2. Run with each target:

    dbt run --target dev      # views
    dbt run --target prod     # tables
  3. Verify materialization:

    SHOW TABLES IN dbt_alice_dev_marts;   -- views
    SHOW TABLES IN marts;                  -- tables
  4. Try incremental conditional:

    - name: my_fact
      config:
        materialized: "`{{ 'incremental' if target.name == 'prod' else 'table' }}`"
        unique_key: 'id'

    Test:

    dbt run --target dev --select my_fact   # table
    dbt run --target prod --select my_fact   # incremental
    dbt run --target prod --select my_fact   # incremental (only new rows)
  5. Time the runs — observe speedup в dev with views.


Ключевые выводы

  1. Conditional materialization через target.name: table в prod (performance), view в dev (fast iteration), incremental только prod (TB scale optimization).
  2. dbt_project.yml configs: materialized: "{{ 'table' if target.name == 'prod' else 'view' }}". Per-folder, per-model.
  3. Ephemeral в dev для intermediate models — CTE inlining, нет DDL, fastest iteration. View или table в prod.
  4. Incremental only в prod — simplifies dev (no is_incremental logic), preserves prod performance.
  5. —target flag для override defaults. Combined with env_vars для one-off needs.
  6. Per-model variation — some always view (small aggregates), always table (ML features), always incremental (huge facts). Mix based on purpose.
  7. CI should mirror prod — same materialization to expose prod-specific bugs (incremental edge cases) before deploy.
  8. Anti-patterns: same materialization everywhere, incremental in dev (pain), views для wide marts (slow dashboards), tables для trivial staging (waste).
Проверка знанийKnowledge check
Production scenario: developer iterating на fct_orders model (5GB). Each dbt run takes 10 minutes. Developer frustrated. Что implement?
ОтветAnswer
**Conditional materialization** — view в dev, table в prod, **plus sample data в dev**.\n\n**Problem**:\n\n- 5GB fact table\n- Full rebuild = 10 min\n- Developer iterates 20-50 times per day\n- 20 × 10 min = 200 min daily wasted\n\n**Solution stack**:\n\n**Layer 1 — Conditional materialization** (immediate fix):\n\n```yaml\n- name: fct_orders\n config:\n materialized: "{{ 'incremental' if target.name == 'prod' else 'view' }}"\n```\n\nDev: view materialization. No physical table creation. `dbt run` = ~5 seconds (just compile + DDL view).\n\n**But query performance в dev?**\n\nWith 5GB source, view query (which re-runs SQL) — slow if developer queries view много раз. Дев use case usually:\n- Quick iteration on SQL logic (view is fine — small queries with LIMIT)\n- Не aggressive querying\n\n**Layer 2 — Sample data в dev**:\n\n```yaml\n- name: fct_orders\n config:\n materialized: "{{ 'incremental' if target.name == 'prod' else 'view' }}"\n```\n\n```sql\n-- models/marts/fct_orders.sql\nSELECT\n order_id,\n customer_id,\n order_total,\n order_date\nFROM {{ ref('stg_orders') }}\n\n{% if target.name == 'dev' %}\n WHERE order_date не меньше CURRENT_DATE - INTERVAL '30 day'\n LIMIT 100000 -- max 100k rows для dev\n{% endif %}\n```\n\nDev: view над 30-day sample, max 100k rows. Query от view быстрый (small data). Developer iterates fast.\n\nProd: incremental с full data. Daily run processes only new orders.\n\n**Layer 3 — Ephemeral intermediate** (если applicable):\n\nIf int_orders_enriched is intermediate before fct_orders:\n\n```yaml\n- name: int_orders_enriched\n config:\n materialized: "{{ 'ephemeral' if target.name == 'dev' else 'view' }}"\n```\n\nEphemeral = CTE inlining. No DDL для intermediate. Faster pipeline.\n\n**Layer 4 — Smart selection**:\n\nDeveloper не нужно build everything. Only what they're working on:\n\n```bash\n# Build only target + immediate downstream (not entire DAG)\ndbt run --select fct_orders+1 --target dev\n\n# Defer to existing prod artifacts (slim CI pattern — детали в модуле 13-ci-cd-github/03-slim-ci и 04-defer-deep)\ndbt run --select fct_orders --defer --state prod_manifest/ --target dev\n```\n\nDefer reads upstream from prod manifest. No need to materialize upstream в dev. Полную механику разбираем в модуле 13.\n\n**Result for developer**:\n\nBefore: 10 min × 30 iterations = 5 hours daily wasted\nAfter: 5 sec × 30 iterations = 2.5 min daily\n\n**Saved: 4.5 hours daily** ≈ half a working day. ROI massive.\n\n**Production unchanged**:\n- fct_orders incremental — same as before\n- Full data, optimized for query performance\n- Daily run processes only new\n\n**Communication**:\n\n```\n[Performance improvements]\n\nfct_orders теперь view-based в dev:\n- dbt run: 10min -> 5sec \n- Sample: 30-day window + max 100k rows\n- Full data still в prod\n\nTo run with full data in dev: \n REMOVE_DEV_SAMPLE=true dbt run --target dev\n\nTo iterate fast: dbt run --select <model>+1 --defer\n```\n\nProduction-grade developer experience — fast iteration без compromising prod data quality.
Проверка знанийKnowledge check
Senior говорит: 'incremental только в prod — но CI тоже должна тестировать incremental logic'. Как организовать?
ОтветAnswer
Critical point — CI часто mirrors prod materialization для catching incremental-specific bugs. Implementation strategies. _(Полная механика CI jobs/workflows — в модуле 13-ci-cd-github; здесь только target-conditional часть.)_\n\n**Problem**:\n\n```yaml\n# Default: incremental только в prod\nmaterialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"\n```\n\n- Dev: table (simple, fast iteration)\n- CI: table (simple)\n- Prod: incremental\n\nIncremental-specific bugs не surface в CI:\n- `is_incremental()` logic wrong\n- Wrong `unique_key` causing duplicates\n- Missing `merge_exclude_columns` causing data loss\n- `incremental_predicates` filtering too aggressively\n\nProd run = surprise breakage.\n\n**Solution 1 — CI uses incremental too**:\n\n```yaml\nmaterialized: "{{ 'incremental' if target.name in ('prod', 'ci') else 'table' }}"\n```\n\nCI also incremental. Tests run against incremental behavior. Catches:\n- Empty initial run (full refresh)\n- Subsequent runs (is_incremental=true)\n- Backfills, --full-refresh\n\n**Tradeoff**: CI slower (incremental setup), but matches prod.\n\n**Solution 2 — Separate CI-prod target**:\n\n```yaml\n# profiles.yml\nci:\n type: duckdb\n path: ':memory:'\n schema: 'ci'\n\nci-prod-like:\n type: duckdb\n path: ':memory:'\n schema: 'ci_prod'\n```\n\nModel config:\n\n```yaml\nmaterialized: "{{ 'incremental' if target.name in ('prod', 'ci-prod-like') else 'table' }}"\n```\n\nCI runs both:\n```yaml\njobs:\n ci-fast:\n - run: dbt build --target ci # tables — fast for unit tests\n ci-prod-like:\n needs: ci-fast\n - run: dbt build --target ci-prod-like # incrementals — for production validation\n```\n\nTwo CI passes — fast first, prod-like second. Combines speed + safety.\n\n**Solution 3 — Sample incremental backfill в dev**:\n\n```yaml\nmaterialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"\n```\n\nDev table, but developer can also test incremental flow:\n\n```bash\n# Test full refresh\ndbt run --target dev --full-refresh\n\n# Test incremental update (manually delete some, then re-run)\ndbt run --target dev # incremental вернее всего trigger if can override\n```\n\nManual workflow для testing. Not automated.\n\n**Solution 4 — Override через env var**:\n\n```yaml\nmaterialized: >\n {{\n 'incremental' if env_var('TEST_INCREMENTAL', 'false') | as_bool\n else 'incremental' if target.name == 'prod'\n else 'table'\n }}\n```\n\nDeveloper or CI can:\n```bash\nTEST_INCREMENTAL=true dbt run --target dev\n```\n\nOpt-in incremental testing.\n\n**Production recommendation**:\n\n**Combined approach** — fast CI + prod-like CI + monitoring:\n\n1. **Fast CI**: `target=ci`, table materialization. Unit tests, data tests. Catches most bugs. ~5 min.\n\n2. **Prod-like CI**: `target=ci-prod-like`, incremental materialization. Validates incremental logic. ~15 min.\n\n3. **Pre-deploy validation**: Run prod-like before merge to main. Block deploy if prod-like fails.\n\n4. **Production monitoring**: Anomaly detection on row counts. If incremental drops rows unexpectedly, alert.\n\n5. **Manual test on staging**: Before promoting к prod, deploy на staging, observe several daily runs. Caches incremental-specific issues at scale.\n\n**Cost**:\n- Fast CI: 5 min (covers most)\n- Prod-like CI: 15 min (incremental-specific)\n- Combined: 20 min per PR\n- ROI: catching one bad incremental deploy = days/weeks of investigation + data recovery\n\n**Worth it** для production-grade data engineering. Without prod-like CI testing — incremental bugs surface in production регулярно.\n\nThis is **shift-left** principle — catch issues at PR, не at production runtime.

Проверьте понимание

Результат: 0 из 0
Прикладной
Вопрос 1 из 5. Developer iterating на 5GB fct_orders. dbt run = 10 минут. Жалуется на slow iteration. Что implement?

Закончили урок?

Отметьте его как пройденный, чтобы отслеживать свой прогресс

Войдите чтобы оценить урок

Прогресс модуля
0 из 5