Conditional materialization: table в prod, view в dev

В предыдущих уроках мы прошли target.name, env_var, generate_schema_name, grants. Этот урок — про materialization через target conditional: как использовать view в dev (быстрый rebuild), table в prod (query performance), incremental только в prod (избежать complexity в dev).

Хорошее multi-env materialization config экономит часы developer time (faster iteration в dev) и optimizes production query performance. Это last but critical piece of multi-env puzzle.

dbt-iii: анатомия materialization — как работает table vs view vs incremental

Базовый pattern: table prod, view dev

# dbt_project.yml
models:
  my_project:
    +materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"

Что happens:

Dev / CI (target.name != 'prod'): all models materialized as views
Prod: all materialized as tables

Or per-layer:

models:
  my_project:
    staging:
      +materialized: view   # views везде (light layer)

    intermediate:
      +materialized: "{{ 'ephemeral' if target.name != 'prod' else 'view' }}"
      # CTE inlining в dev, real views в prod

    marts:
      +materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"

Why this pattern

Materialization tradeoffs

Math example:

Model takes 10 minutes для full rebuild.

Dev (view):

dbt run --select my_model = compile + create view = ~5 seconds
Tradeoff: each ref query reruns the SQL — slow for analytical queries
BUT — dev usually uses sample data -> small impact

Prod (table):

dbt run --select my_model = full rebuild = 10 minutes
Read queries fast (just SELECT from table)
Daily scheduled run amortizes cost

Prod (incremental):

Daily run = INSERT new rows = 1 minute (10x faster than full)
Initial backfill = 10 minutes (one-time)

Conditional incremental только в prod

Incremental adds complexity:

is_incremental() macro logic
Need to handle --full-refresh properly
Test setup more involved
In dev: backfilling each run = ужас

Solution — incremental только в prod, table в dev:

- name: large_fact_table
  config:
    materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"
    unique_key: order_id
    incremental_strategy: 'merge'

SELECT
    order_id,
    customer_id,
    order_total
FROM {{ ref('stg_orders') }}

{% if is_incremental() and target.name == 'prod' %}
  WHERE order_date >= (SELECT MAX(order_date) FROM {{ this }}) - INTERVAL '3 day'
{% endif %}

In dev: target.name=‘dev’, materialized=‘table’, is_incremental() evaluates to false (table is new on every run). Full SQL runs. Fast for small dev data.

In prod: materialized=‘incremental’, is_incremental() true after first run. Only new orders processed. Fast.

Conditional unique_key / partition / cluster

Sometimes config differs:

- name: orders_partitioned
  config:
    materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"
    incremental_strategy: 'merge'
    unique_key: order_id

    # Snowflake / BigQuery: partition by date в prod (faster queries)
    cluster_by: "{{ ['order_date', 'customer_id'] if target.name == 'prod' else none }}"

In dev — no clustering (overhead не оправдан на маленьких данных). In prod — cluster для query performance.

Ephemeral materialization

materialized: ephemeral — model not physically materialized. Just CTE inlined в downstream models.

- name: int_user_demographics
  config:
    materialized: "{{ 'ephemeral' if target.name != 'prod' else 'view' }}"

Dev: ephemeral — no DDL, no physical object. SQL copy-pasted in downstream as CTE. Faster iteration.

Prod: view — real DDL. Downstream uses standard FROM int_user_demographics. Standard production behavior.

Use case: intermediate models in dev pipelines где не нужны permanent objects. Saves cleanup, faster runs.

Limits:

Ephemeral models cannot have data tests (no table to query)
Cannot have grants
Cannot ref from non-dbt code

—target overrides

Sometimes need different behavior. Use --target flag:

# Default: dev config (views)
dbt run

# Specific target: prod config (tables)
dbt run --target prod

# Override default vars
dbt run --target prod --vars '{full_refresh: true}'

In dbt_project.yml conditional logic auto-applies based on target.

For one-off overrides — set env_var и use в conditional:

+materialized: >
  {{
    'table' if env_var('FORCE_TABLE', 'false') | as_bool
    else 'table' if target.name == 'prod'
    else 'view'
  }}

# Force tables in dev (rare)
FORCE_TABLE=true dbt run --target dev

Per-model conditional materialization

Some models need different patterns:

1. Large fact table — always incremental в prod:

- name: large_fact_table
  config:
    materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"
    unique_key: 'id'
    incremental_strategy: 'merge'

2. Snapshot — always table (snapshots have their own materialization, but tables similar):

- name: customer_snapshot
  config:
    materialized: table   # always — snapshots need persistence

3. Aggregates — always view (small, cheap to recompute):

- name: daily_revenue_summary
  config:
    materialized: view   # small summary, view is fine even prod

4. ML feature tables — always table:

- name: customer_features
  config:
    materialized: table   # ML pipelines need stable storage

Mix and match based on model purpose, not blanket policy.

Полный example

# dbt_project.yml
name: my_project
version: '1.0.0'

models:
  my_project:
    # Default: table prod, view dev
    +materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"

    staging:
      +schema: staging
      +materialized: view   # always view (light layer)

    intermediate:
      +schema: intermediate
      +materialized: "{{ 'ephemeral' if target.name == 'dev' else 'view' }}"
      # ephemeral в dev для speed, view в prod для observability

    marts:
      +schema: marts
      +materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"

      # Specific overrides
      fct_orders:
        +materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"
        +unique_key: 'order_id'
        +incremental_strategy: 'merge'

      daily_revenue:
        +materialized: view   # always view (small aggregate)

Use:

dbt run --target dev      # views, ephemeral intermediate
dbt run --target ci       # views (CI = dev pattern)
dbt run --target prod     # tables, incremental for fct_orders, view for daily_revenue

Performance impact

Real numbers (example):

Dev cycle (developer iterating on model):

View pattern: 5-30 sec per dbt run (just compile + DDL)
Table pattern: 5-20 min per dbt run (full materialize)
Savings: 80-95% per iteration

For developer making 20-50 changes per day, view pattern saves hours.

Production runs:

Tables: stable, fast reads, predictable cost
Incremental: 10x faster runs for huge tables (TB scale)
Views в prod: slower reads (DDL re-evaluated each query), but no storage cost

BI tool query performance (consuming models):

Table: sub-second to 10s typically
View: 10s to minutes (depends on complexity)
For dashboards refreshed often, tables crucial

Anti-patterns

1. Same materialization everywhere

+materialized: table

In dev — slow iteration. In prod — sometimes view enough (small aggregates).

Use target-conditional.

2. Incremental in dev

- name: orders
  config:
    materialized: incremental
    unique_key: order_id

Without target conditional. In dev: pain — incremental logic complexity for no benefit (small data).

Use:

materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"

3. View materialization для широкой mart

- name: revenue_dashboard_mart
  config:
    materialized: view   # always

BI tool query — every refresh recomputes view (joins, aggregates). Slow dashboards.

Use table или incremental в prod.

4. Tables for trivial transformations

- name: stg_users
  config:
    materialized: table   # SELECT * FROM raw with renames

No benefit. View suffices. Save storage.

5. Hardcoded test config

- name: my_model
  config:
    materialized: view
    full_refresh: false

full_refresh: false — never accept —full-refresh. If model becomes incremental в future, refresh impossible. Be careful.

6. Conditional based on user input

materialized: "{{ 'table' if var('use_tables') else 'view' }}"

Confusing — config depends on runtime vars. Hard to reason. Use target.name (stable).

7. Different materialization in CI vs prod

materialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"

CI uses table — tests pass. Prod uses incremental — может expose incremental-specific bugs not seen in CI.

Better: CI mimics prod materialization:

materialized: "{{ 'incremental' if target.name in ('prod', 'ci') else 'table' }}"

Or have separate CI target that mirrors prod config.

CI gate для materialization

CI gates подробно в модуле 13

Скрипт ниже — иллюстрация, как target-aware materialization-config можно проверить через manifest. Полная механика CI gates, prod manifest, state:modified+ — в модуле 13-ci-cd-github.

# scripts/check_materialization.py
import json

with open('target/manifest.json') as f:
    manifest = json.load(f)

errors = []
for node in manifest['nodes'].values():
    if node['resource_type'] != 'model':
        continue

    fqn = node['fqn']
    materialized = node['config'].get('materialized', '')

    # Marts should not be views in prod (assumes building prod manifest)
    is_mart = any(p == 'marts' for p in fqn)
    if is_mart and materialized == 'view':
        # check if conditional
        # In manifest, it's resolved value — so 'view' means prod produces views
        errors.append(f'Mart {node[\"name\"]} is view in this target — performance concern')

Hint: build manifest for --target prod specifically.

Попробуй сам

Set conditional materialization:

models:
  my_project:
    +materialized: "`{{ 'table' if target.name == 'prod' else 'view' }}`"

Run with each target:

dbt run --target dev      # views
dbt run --target prod     # tables

Verify materialization:

SHOW TABLES IN dbt_alice_dev_marts;   -- views
SHOW TABLES IN marts;                  -- tables

Try incremental conditional:

- name: my_fact
  config:
    materialized: "`{{ 'incremental' if target.name == 'prod' else 'table' }}`"
    unique_key: 'id'

Test:

dbt run --target dev --select my_fact   # table
dbt run --target prod --select my_fact   # incremental
dbt run --target prod --select my_fact   # incremental (only new rows)

Time the runs — observe speedup в dev with views.

Ключевые выводы

Conditional materialization через target.name: table в prod (performance), view в dev (fast iteration), incremental только prod (TB scale optimization).
dbt_project.yml configs: materialized: "{{ 'table' if target.name == 'prod' else 'view' }}". Per-folder, per-model.
Ephemeral в dev для intermediate models — CTE inlining, нет DDL, fastest iteration. View или table в prod.
Incremental only в prod — simplifies dev (no is_incremental logic), preserves prod performance.
—target flag для override defaults. Combined with env_vars для one-off needs.
Per-model variation — some always view (small aggregates), always table (ML features), always incremental (huge facts). Mix based on purpose.
CI should mirror prod — same materialization to expose prod-specific bugs (incremental edge cases) before deploy.
Anti-patterns: same materialization everywhere, incremental in dev (pain), views для wide marts (slow dashboards), tables для trivial staging (waste).

Проверка знанийKnowledge check

Production scenario: developer iterating на fct_orders model (5GB). Each dbt run takes 10 minutes. Developer frustrated. Что implement?

ОтветAnswer

**Conditional materialization** — view в dev, table в prod, **plus sample data в dev**.\n\n**Problem**:\n\n- 5GB fact table\n- Full rebuild = 10 min\n- Developer iterates 20-50 times per day\n- 20 × 10 min = 200 min daily wasted\n\n**Solution stack**:\n\n**Layer 1 — Conditional materialization** (immediate fix):\n\n```yaml\n- name: fct_orders\n config:\n materialized: "{{ 'incremental' if target.name == 'prod' else 'view' }}"\n```\n\nDev: view materialization. No physical table creation. `dbt run` = ~5 seconds (just compile + DDL view).\n\n**But query performance в dev?**\n\nWith 5GB source, view query (which re-runs SQL) — slow if developer queries view много раз. Дев use case usually:\n- Quick iteration on SQL logic (view is fine — small queries with LIMIT)\n- Не aggressive querying\n\n**Layer 2 — Sample data в dev**:\n\n```yaml\n- name: fct_orders\n config:\n materialized: "{{ 'incremental' if target.name == 'prod' else 'view' }}"\n```\n\n```sql\n-- models/marts/fct_orders.sql\nSELECT\n order_id,\n customer_id,\n order_total,\n order_date\nFROM {{ ref('stg_orders') }}\n\n{% if target.name == 'dev' %}\n WHERE order_date не меньше CURRENT_DATE - INTERVAL '30 day'\n LIMIT 100000 -- max 100k rows для dev\n{% endif %}\n```\n\nDev: view над 30-day sample, max 100k rows. Query от view быстрый (small data). Developer iterates fast.\n\nProd: incremental с full data. Daily run processes only new orders.\n\n**Layer 3 — Ephemeral intermediate** (если applicable):\n\nIf int_orders_enriched is intermediate before fct_orders:\n\n```yaml\n- name: int_orders_enriched\n config:\n materialized: "{{ 'ephemeral' if target.name == 'dev' else 'view' }}"\n```\n\nEphemeral = CTE inlining. No DDL для intermediate. Faster pipeline.\n\n**Layer 4 — Smart selection**:\n\nDeveloper не нужно build everything. Only what they're working on:\n\n```bash\n# Build only target + immediate downstream (not entire DAG)\ndbt run --select fct_orders+1 --target dev\n\n# Defer to existing prod artifacts (slim CI pattern — детали в модуле 13-ci-cd-github/03-slim-ci и 04-defer-deep)\ndbt run --select fct_orders --defer --state prod_manifest/ --target dev\n```\n\nDefer reads upstream from prod manifest. No need to materialize upstream в dev. Полную механику разбираем в модуле 13.\n\n**Result for developer**:\n\nBefore: 10 min × 30 iterations = 5 hours daily wasted\nAfter: 5 sec × 30 iterations = 2.5 min daily\n\n**Saved: 4.5 hours daily** ≈ half a working day. ROI massive.\n\n**Production unchanged**:\n- fct_orders incremental — same as before\n- Full data, optimized for query performance\n- Daily run processes only new\n\n**Communication**:\n\n```\n[Performance improvements]\n\nfct_orders теперь view-based в dev:\n- dbt run: 10min -> 5sec \n- Sample: 30-day window + max 100k rows\n- Full data still в prod\n\nTo run with full data in dev: \n REMOVE_DEV_SAMPLE=true dbt run --target dev\n\nTo iterate fast: dbt run --select <model>+1 --defer\n```\n\nProduction-grade developer experience — fast iteration без compromising prod data quality.

Проверка знанийKnowledge check

Senior говорит: 'incremental только в prod — но CI тоже должна тестировать incremental logic'. Как организовать?

ОтветAnswer

Critical point — CI часто mirrors prod materialization для catching incremental-specific bugs. Implementation strategies. _(Полная механика CI jobs/workflows — в модуле 13-ci-cd-github; здесь только target-conditional часть.)_\n\n**Problem**:\n\n```yaml\n# Default: incremental только в prod\nmaterialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"\n```\n\n- Dev: table (simple, fast iteration)\n- CI: table (simple)\n- Prod: incremental\n\nIncremental-specific bugs не surface в CI:\n- `is_incremental()` logic wrong\n- Wrong `unique_key` causing duplicates\n- Missing `merge_exclude_columns` causing data loss\n- `incremental_predicates` filtering too aggressively\n\nProd run = surprise breakage.\n\n**Solution 1 — CI uses incremental too**:\n\n```yaml\nmaterialized: "{{ 'incremental' if target.name in ('prod', 'ci') else 'table' }}"\n```\n\nCI also incremental. Tests run against incremental behavior. Catches:\n- Empty initial run (full refresh)\n- Subsequent runs (is_incremental=true)\n- Backfills, --full-refresh\n\n**Tradeoff**: CI slower (incremental setup), but matches prod.\n\n**Solution 2 — Separate CI-prod target**:\n\n```yaml\n# profiles.yml\nci:\n type: duckdb\n path: ':memory:'\n schema: 'ci'\n\nci-prod-like:\n type: duckdb\n path: ':memory:'\n schema: 'ci_prod'\n```\n\nModel config:\n\n```yaml\nmaterialized: "{{ 'incremental' if target.name in ('prod', 'ci-prod-like') else 'table' }}"\n```\n\nCI runs both:\n```yaml\njobs:\n ci-fast:\n - run: dbt build --target ci # tables — fast for unit tests\n ci-prod-like:\n needs: ci-fast\n - run: dbt build --target ci-prod-like # incrementals — for production validation\n```\n\nTwo CI passes — fast first, prod-like second. Combines speed + safety.\n\n**Solution 3 — Sample incremental backfill в dev**:\n\n```yaml\nmaterialized: "{{ 'incremental' if target.name == 'prod' else 'table' }}"\n```\n\nDev table, but developer can also test incremental flow:\n\n```bash\n# Test full refresh\ndbt run --target dev --full-refresh\n\n# Test incremental update (manually delete some, then re-run)\ndbt run --target dev # incremental вернее всего trigger if can override\n```\n\nManual workflow для testing. Not automated.\n\n**Solution 4 — Override через env var**:\n\n```yaml\nmaterialized: >\n {{\n 'incremental' if env_var('TEST_INCREMENTAL', 'false') | as_bool\n else 'incremental' if target.name == 'prod'\n else 'table'\n }}\n```\n\nDeveloper or CI can:\n```bash\nTEST_INCREMENTAL=true dbt run --target dev\n```\n\nOpt-in incremental testing.\n\n**Production recommendation**:\n\n**Combined approach** — fast CI + prod-like CI + monitoring:\n\n1. **Fast CI**: `target=ci`, table materialization. Unit tests, data tests. Catches most bugs. ~5 min.\n\n2. **Prod-like CI**: `target=ci-prod-like`, incremental materialization. Validates incremental logic. ~15 min.\n\n3. **Pre-deploy validation**: Run prod-like before merge to main. Block deploy if prod-like fails.\n\n4. **Production monitoring**: Anomaly detection on row counts. If incremental drops rows unexpectedly, alert.\n\n5. **Manual test on staging**: Before promoting к prod, deploy на staging, observe several daily runs. Caches incremental-specific issues at scale.\n\n**Cost**:\n- Fast CI: 5 min (covers most)\n- Prod-like CI: 15 min (incremental-specific)\n- Combined: 20 min per PR\n- ROI: catching one bad incremental deploy = days/weeks of investigation + data recovery\n\n**Worth it** для production-grade data engineering. Without prod-like CI testing — incremental bugs surface in production регулярно.\n\nThis is **shift-left** principle — catch issues at PR, не at production runtime.

Conditional materialization: table в prod, view в dev

Базовый pattern: table prod, view dev

Why this pattern

Conditional incremental только в prod

Conditional unique_key / partition / cluster

Ephemeral materialization

—target overrides

Per-model conditional materialization

Полный example

Performance impact

Anti-patterns

1. Same materialization everywhere

2. Incremental in dev

3. View materialization для широкой mart

4. Tables for trivial transformations

5. Hardcoded test config

6. Conditional based on user input

7. Different materialization in CI vs prod

CI gate для materialization

Попробуй сам

Ключевые выводы

Закончили урок?