Learning Platform
Глоссарий Troubleshooting
Урок 12.03 · 24 мин
Средний
generate_schema_nameSchema isolationMulti-developerProduction patternsMacro override

generate_schema_name override: per-developer и clean prod schemas

В большом проекте с многими developers — каждый запускает dbt run локально. Если все пишут в один shared dev schema, они наступают на ноги друг другу: Alice runs customer_metrics, Bob runs параллельно — конфликт, таблица перезаписывается.

Решение — per-developer schemas в dev: dbt_alice_dev, dbt_bob_dev. Каждый разработчик имеет свой namespace. В prod — clean schemas без префиксов: marts, staging.

dbt позволяет настроить это через override generate_schema_name macro. Этот урок — как переопределить, какие patterns common, и как наводить порядок в multi-developer environment.

dbt-iii: generate_schema_name — полный разбор механизма

Default behavior без override

dbt по умолчанию имеет:

schema = target.schema           # если YAML +schema не set
schema = target.schema + '_' + custom_schema   # если +schema: 'custom'

Например, target.schema = 'main':

  • Model без +schema -> main
  • Model с +schema: marts -> main_marts

Это append pattern. На prod часто это нежелательно — хочется marts чисто.

И для dev — все developers пишут в main_marts, конфликты.


Override syntax

В macros/get_custom_schema.sql (имя файла любое):

{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}

    {%- if target.name == 'prod' -%}
        {%- if custom_schema_name is none -%}
            {{ default_schema }}
        {%- else -%}
            {{ custom_schema_name | trim }}
        {%- endif -%}

    {%- elif target.name == 'dev' -%}
        {%- if custom_schema_name is none -%}
            {{ default_schema }}
        {%- else -%}
            {{ default_schema }}_{{ custom_schema_name | trim }}
        {%- endif -%}

    {%- else -%}
        {{ default_schema }}_{{ custom_schema_name | trim }}
    {%- endif -%}
{%- endmacro %}

Что делает:

  • В prod: +schema: marts -> schema = marts (clean, no prefix)
  • В dev: +schema: marts -> schema = dbt_alice_dev_marts (with prefix)
  • В other targets (ci): same dev-like behavior

dbt автоматически использует этот override при resolve schema names.


Per-developer schema через env_var

# profiles.yml
my_project:
  outputs:
    dev:
      type: duckdb
      path: './dev.duckdb'
      schema: "dbt_{{ env_var('USER', 'unknown') }}_dev"   # uses OS USER
      threads: 4

Alice locally:

$USER = alice
target.schema = 'dbt_alice_dev'

Bob locally:

$USER = bob
target.schema = 'dbt_bob_dev'

Через generate_schema_name override -> models go в dbt_alice_dev_marts, dbt_bob_dev_marts, etc. Isolated namespaces — no conflicts.


Полный пример: production patterns

Profile

# profiles.yml
my_project:
  target: dev

  outputs:
    dev:
      type: duckdb
      path: './dev_{{ env_var(\"USER\", \"unknown\") }}.duckdb'
      schema: "dbt_{{ env_var('USER', 'unknown') }}_dev"
      threads: 4

    ci:
      type: duckdb
      path: ':memory:'
      schema: 'ci'
      threads: 4

    prod:
      type: snowflake
      account: "{{ env_var('SF_ACCOUNT') }}"
      user: "{{ env_var('SF_USER') }}"
      password: "{{ env_var('SF_PASSWORD') }}"
      database: 'ANALYTICS_PROD'
      schema: 'transformed'   # base schema в prod
      warehouse: 'PROD_WH'
      threads: 16

dbt_project.yml

name: my_project

models:
  my_project:
    +schema: ''   # default empty

    staging:
      +schema: staging

    intermediate:
      +schema: intermediate

    marts:
      +schema: marts

    marts:
      finance:
        +schema: marts_finance
      marketing:
        +schema: marts_marketing

Override macro

-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}

    {%- if target.name == 'prod' -%}
        {# Production: custom_schema_name перекрывает default. Clean names. #}
        {%- if custom_schema_name is none -%}
            {{ default_schema }}
        {%- else -%}
            {{ custom_schema_name | trim }}
        {%- endif -%}

    {%- else -%}
        {# Dev / CI: default + custom = dbt_alice_dev_marts #}
        {%- if custom_schema_name is none -%}
            {{ default_schema }}
        {%- else -%}
            {{ default_schema }}_{{ custom_schema_name | trim }}
        {%- endif -%}
    {%- endif -%}
{%- endmacro %}

Результат

Production (target.schema='transformed', target.name='prod'):

Model layerSchema in warehouse
staging.usersstaging
intermediate.user_metricsintermediate
marts.fct_ordersmarts
marts.finance.revenuemarts_finance
marts.marketing.campaignsmarts_marketing

Clean — marts, не transformed_marts.

Dev for Alice (target.schema='dbt_alice_dev', target.name='dev'):

ModelSchema
staging.usersdbt_alice_dev_staging
marts.fct_ordersdbt_alice_dev_marts
marts.finance.revenuedbt_alice_dev_marts_finance

Isolated — никаких конфликтов с Bob.

CI (target.schema='ci', target.name='ci'):

ModelSchema
staging.usersci_staging
marts.fct_ordersci_marts

Isolated CI environment.


Common patterns

Pattern 1: per-developer + per-feature branch

Для team с feature branches:

schema: "dbt_{{ env_var('USER', 'unknown') }}_{{ env_var('GIT_BRANCH', 'main') | replace('/', '_') }}"

Alice on feature/new-revenue branch:

$USER = alice
$GIT_BRANCH = feature/new-revenue
target.schema = dbt_alice_feature_new_revenue

Каждая feature branch isolated. Может test без conflicts.

Pattern 2: PR-based isolation (для team с frequent PRs)

schema: "dbt_pr_{{ env_var('PR_NUMBER', 'unknown') }}"

PR #123: schema = dbt_pr_123. После merge — cleanup.

Pattern 3: time-based isolation (для experimentation)

schema: "dbt_{{ env_var('USER') }}_{{ env_var('EXPERIMENT_ID', run_started_at.timestamp() | int) }}"

Each run — different schema. Useful for A/B testing logic.

Pattern 4: shared dev schema (small team)

schema: "dbt_dev"   # shared everyone

For 2-3 developers, simple. Risk: conflicts при simultaneous runs. Mitigate через —select to prevent overlap.


Cleanup dev schemas

Per-developer schemas accumulate junk over time. Periodic cleanup:

-- macros/cleanup_dev_schemas.sql
{% macro cleanup_dev_schemas() %}
    {% if target.name != 'dev' %}
        {{ exceptions.raise_compiler_error('cleanup_dev_schemas can only run in dev') }}
    {% endif %}

    {% set schemas_query %}
        SELECT schema_name FROM information_schema.schemata
        WHERE schema_name LIKE 'dbt_%_dev%'
    {% endset %}

    {% set results = run_query(schemas_query) %}
    {% if execute %}
        {% for schema in results.columns[0].values() %}
            {{ log('Dropping ' ~ schema, info=True) }}
            {% do run_query('DROP SCHEMA IF EXISTS ' ~ schema ~ ' CASCADE') %}
        {% endfor %}
    {% endif %}
{% endmacro %}

Run:

dbt run-operation cleanup_dev_schemas

Cleanup all dev schemas. Use carefully — может wipe own data.

Better: scheduled cleanup of OLD dev schemas (untouched >30 days):

{% macro cleanup_old_dev_schemas() %}
    {% set old_schemas_query %}
        SELECT s.schema_name
        FROM information_schema.schemata s
        LEFT JOIN (
            SELECT table_schema, MAX(last_modified) AS last_used
            FROM information_schema.tables
            GROUP BY table_schema
        ) t ON t.table_schema = s.schema_name
        WHERE s.schema_name LIKE 'dbt_%_dev%'
          AND t.last_used < CURRENT_DATE - INTERVAL '30 day'
    {% endset %}

    {# ... drop logic #}
{% endmacro %}

Scheduled run раз в неделю — keeps warehouse clean.


Grants и schemas

Per-developer schemas needs grants. Через generate_schema_name + grants config:

# dbt_project.yml
models:
  my_project:
    marts:
      +grants:
        select: "{{ ['analyst_role'] if target.name == 'prod' else [] }}"

In dev — no grants (private schema). In prod — analysts can SELECT.

This pairs with grants config (next lesson).


Anti-patterns

1. Hardcoded dev schemas в models

- name: customer_metrics
  config:
    schema: dbt_alice_dev_marts   # specific to Alice!

Hardcoded на developer. Catastrophic.

Use +schema: marts + generate_schema_name override.

2. No isolation в dev

dev:
  schema: dev_marts   # everyone shares

Race conditions. Conflicts. Lost work.

Per-developer (env_var USER) — solution.

3. Production with developer prefixes

prod:
  schema: dbt_prod_alice_marts   # leaked developer name to production!

Production schemas — clean. No personal info.

Override macro handles this — dev appends, prod doesn’t.

4. Forgot to override

Without override, dbt uses default behavior:

  • target.schema = 'main'
  • +schema: marts -> main_marts

Want marts in prod. Override macro is the only way.

5. Different developers, different naming

Alice’s profile: schema: alice_dev Bob’s profile: schema: dbt_bob_dev

Inconsistent. Code review confusing. Standardize via env_var or team profile.

6. CI uses developer pattern

ci:
  schema: "dbt_{{ env_var('USER', 'ci') }}_ci"

CI doesn’t have USER. Fallback to ‘ci’ works, but inconsistent с production pattern.

Cleaner:

ci:
  schema: 'ci'

Plain. No override needed for CI namespace.


CI: testing schema generation

# scripts/check_schema_generation.py
import json

with open('target/manifest.json') as f:
    manifest = json.load(f)

# All models should have correct schemas based on env
for node in manifest['nodes'].values():
    if node['resource_type'] != 'model':
        continue
    schema = node['schema']

    # Production check: schema should NOT contain 'dev', 'ci', or developer names
    if 'dev' in schema or 'ci' in schema:
        print(f'ERROR: {node[\"name\"]} schema {schema} contains env name')

Useful in pre-prod deploy validation — caught schemas before pushing to prod.


Production scenario: 10 developers, 100 models

Setup:

  • 10 developers, each works locally
  • 100 models in repo
  • Production schema: marts, staging, etc.

Without per-dev schemas:

  • All 10 developers write к dev_marts
  • 5 of them run customer_metrics simultaneously -> race conditions
  • 1 of them runs —full-refresh -> wipes others’ work
  • Endless conflicts

With per-dev schemas:

  • Alice: dbt_alice_dev_marts
  • Bob: dbt_bob_dev_marts
  • … etc
  • Each isolated. No conflicts. Parallel work.

Cost: 10x storage (10 schemas × 100 models). Acceptable for DuckDB (file storage) или Snowflake (compute pays). For DataPond / lake — separate concern.

After PR merge -> CI runs against clean ci_marts. Production runs against marts. Clean separation.


Попробуй сам

  1. Set up per-dev schema:

    # profiles.yml
    dev:
      schema: "dbt_`{{ env_var('USER', 'unknown') }}`_dev"
  2. Add override macro:

    -- macros/generate_schema_name.sql
    `{% macro generate_schema_name(custom_schema_name, node) %}`
        `{% if target.name == 'prod' and custom_schema_name %}`
            `{{ custom_schema_name }}`
        `{% elif custom_schema_name %}`
            `{{ target.schema }}`_`{{ custom_schema_name }}`
        `{% else %}`
            `{{ target.schema }}`
        `{% endif %}`
    `{% endmacro %}`
  3. Set per-model schemas:

    models:
      my_project:
        marts:
          +schema: marts
        staging:
          +schema: staging
  4. Test:

    dbt run --target dev
    # Schema: `dbt_<USER>_dev_marts`
    
    dbt run --target prod
    # Schema: marts
  5. Multi-developer test:

    • Simulate 2 developers (USER=alice and USER=bob)
    • dbt run each — verify separate schemas
    • No conflicts
  6. CI: ensure CI uses ci schema explicitly.


Ключевые выводы

  1. generate_schema_name macro determines schema names per model. Default behavior — append +schema to target.schema. Override for cleaner naming.
  2. Per-developer schemas в dev — isolate developers, no conflicts. Pattern: dbt_{USER}_dev.
  3. Clean prod schemasmarts, not transformed_marts. Override macro returns custom_schema_name directly in prod.
  4. Standard override: dev appends prefix, prod uses bare custom_schema_name. Other targets fallback to dev-like behavior.
  5. Patterns: per-feature-branch (multi-feature dev), PR-based (CI per PR), time-based (experimentation), shared dev (small team).
  6. Cleanup: periodic cleanup old dev schemas (untouched >30 days). Schedule в scheduled runs.
  7. Anti-patterns: hardcoded dev schemas в models, no isolation (shared dev), production with developer prefixes, inconsistent naming across developers.
  8. CI gate: validate prod schemas don’t contain ‘dev’/‘ci’/developer names. Catch errors pre-deploy.
Проверка знанийKnowledge check
Team из 10 developers. Без per-dev schemas: 'каждый раз кто-то перезаписывает мою customer_metrics, я теряю свою работу'. Что нужно implement?
ОтветAnswer
Classic multi-developer conflict. Solution — **per-developer schemas через generate_schema_name override**.\n\n**Phase 1 — Setup profiles.yml**:\n\n```yaml\n# profiles.yml\nmy_project:\n target: dev\n outputs:\n dev:\n type: duckdb\n path: "./dev_{{ env_var('USER', 'unknown') }}.duckdb"\n schema: "dbt_{{ env_var('USER', 'unknown') }}_dev"\n threads: 4\n```\n\nAlice locally: \$USER=alice -> target.schema=`dbt_alice_dev`\nBob locally: \$USER=bob -> target.schema=`dbt_bob_dev`\n\n**Phase 2 — Add generate_schema_name override**:\n\n```sql\n-- macros/generate_schema_name.sql\n{% macro generate_schema_name(custom_schema_name, node) %}\n {% set default_schema = target.schema %}\n \n {% if target.name == 'prod' %}\n {# Production: clean names #}\n {% if custom_schema_name is none %}\n {{ default_schema }}\n {% else %}\n {{ custom_schema_name | trim }}\n {% endif %}\n {% else %}\n {# Dev / CI: append #}\n {% if custom_schema_name is none %}\n {{ default_schema }}\n {% else %}\n {{ default_schema }}_{{ custom_schema_name | trim }}\n {% endif %}\n {% endif %}\n{% endmacro %}\n```\n\n**Phase 3 — Set per-model schemas**:\n\n```yaml\n# dbt_project.yml\nmodels:\n my_project:\n staging:\n +schema: staging\n intermediate:\n +schema: intermediate\n marts:\n +schema: marts\n```\n\n**Result в dev**:\n\nAlice runs:\n- staging.users -> `dbt_alice_dev_staging.users`\n- marts.customer_metrics -> `dbt_alice_dev_marts.customer_metrics`\n\nBob runs параллельно:\n- staging.users -> `dbt_bob_dev_staging.users`\n- marts.customer_metrics -> `dbt_bob_dev_marts.customer_metrics`\n\n**No conflict** — separate namespaces.\n\n**Result в prod**:\n\n- staging.users -> `staging.users`\n- marts.customer_metrics -> `marts.customer_metrics`\n\nClean names, no dev/developer prefixes.\n\n**Phase 4 — Communication**:\n\n```\nTo: team\nSubject: Per-developer schemas enabled\n\nChanges:\n1. Set DBT_PROFILES_DIR в .env\n2. Set USER env var (usually OS USER, auto-detect)\n3. dbt run теперь creates schema dbt_<your-username>_dev_*\n4. Production unchanged — `marts`, `staging` as before\n\nWhy:\n- Each developer has own namespace\n- No conflicts when 10 of you run customer_metrics simultaneously\n- Easier debugging — you see your data in dbt_<you>_dev_marts.customer_metrics\n\nMigration:\n- Today: drop your old shared dev tables (or leave, will be unused)\n- Tomorrow: dbt run --target dev -> creates your isolated schemas\n\nQuestions: #data-help\n```\n\n**Phase 5 — Cleanup (periodic)**:\n\nMonthly run: drop dev schemas untouched >30 days. Macro:\n\n```sql\n{% macro cleanup_old_dev_schemas() %}\n {% set query %}\n SELECT schema_name FROM information_schema.schemata\n WHERE schema_name LIKE 'dbt_%_dev%'\n AND ... last_used < CURRENT_DATE - INTERVAL '30 day'\n {% endset %}\n {# drop loop #}\n{% endmacro %}\n```\n\nКеер warehouse clean.\n\n**Cost analysis**:\n\n- 10 developers × ~50 models materialized = 500 tables in dev (vs 50 без isolation)\n- For DuckDB (file storage): negligible cost per developer\n- For Snowflake: compute pays per query — storage cheap\n- For BigQuery: storage costs scale with usage — acceptable\n\n**Worth it** — saved engineering time × 10 developers = hours per week. Pay once for setup.\n\nЭто standard для multi-developer dbt projects.
Проверка знанийKnowledge check
После enable per-dev schemas: developer Alice run `dbt run --target prod` (accidentally) — production schemas become `dbt_alice_dev_marts`. Catastrophic. Как prevent?
ОтветAnswer
Production safety — multiple safeguards.\n\n**Root cause**: `generate_schema_name` override uses `target.name` to differentiate. If Alice runs `--target prod`, target.name=prod, target.schema=alice's value (from profiles.yml maybe not set for prod). Schema becomes weird hybrid.\n\n**Or worse**: prod profile uses USER env var:\n\n```yaml\nprod:\n schema: "{{ env_var('PROD_SCHEMA', 'marts') }}"\n```\n\nИф PROD_SCHEMA not set -> default 'marts' — OK. But if accidentally inherits dev pattern -> `dbt_alice_dev_marts` in production. Bad.\n\n**Prevention strategies**:\n\n**1. Hardcoded prod schema в profiles**:\n\n```yaml\nprod:\n type: snowflake\n schema: 'transformed' # hardcoded, not env_var\n```\n\nNo way для override через USER env. Safe baseline.\n\n**2. Production profile separate**:\n\nProduction credentials и connection info live в **CI-only profiles.yml** или **CI secrets**, не accessible from developer machines.\n\n```yaml\n# profiles.yml в repo — dev и ci targets\ndev:\n schema: "dbt_{{ env_var('USER') }}_dev"\nci:\n schema: 'ci'\n\n# prod target ONLY in CI/CD pipeline через DBT_PROFILES_DIR\n# Developer machine: no prod target defined\n```\n\nDeveloper can't `--target prod` because target not defined.\n\n**3. CI/CD only prod runs**:\n\nProd runs **only из CI/CD pipeline** with restricted access. Developer cannot run locally:\n\n```yaml\n# .github/workflows/deploy.yml\non:\n push:\n branches: [main] # only main -> triggers prod\n workflow_dispatch: # only authorized users\n\njobs:\n prod-deploy:\n runs-on: ubuntu-latest\n if: github.actor in fromJson('["alice-admin", "bob-admin"]')\n steps:\n - run: dbt build --target prod\n env:\n SF_PASSWORD: ${{ secrets.PROD_PASSWORD }}\n```\n\nNo developer triggers prod from local machine.\n\n**4. CI gate на schema names**:\n\nPython script validates prod target produces correct schemas:\n\n```python\n# scripts/check_prod_schemas.py\nimport json\n\nwith open('target/manifest.json') as f:\n manifest = json.load(f)\n\nFORBIDDEN_PATTERNS = ['dev', 'ci', 'staging'] # adjust by env\n\nerrors = []\nfor node in manifest['nodes'].values():\n if node['resource_type'] != 'model':\n continue\n schema = node['schema']\n for pattern in FORBIDDEN_PATTERNS:\n if pattern in schema:\n errors.append(f'{node["name"]}: schema {schema} contains {pattern}')\n\nif errors:\n print('\\n'.join(errors))\n print('Production schemas should NOT contain dev/ci/staging.')\n exit(1)\n```\n\nIn CI:\n```yaml\n- run: dbt compile --target prod\n- run: python scripts/check_prod_schemas.py\n```\n\nIf any schema contains forbidden pattern -> CI fails. Catches misconfiguration before deploy.\n\n**5. Macro safety check**:\n\n```sql\n-- macros/generate_schema_name.sql\n{% macro generate_schema_name(custom_schema_name, node) %}\n {% set default_schema = target.schema %}\n \n {# Safety check: prod schema must not contain 'dev' #}\n {% if target.name == 'prod' and 'dev' in default_schema %}\n {{ exceptions.raise_compiler_error(\n "Production target has 'dev' in schema name: " ~ default_schema ~ \n ". Check profiles.yml."\n ) }}\n {% endif %}\n \n {# Normal logic #}\n ...\n{% endmacro %}\n```\n\nDeveloper accidentally tries `--target prod` с misconfigured profile -> macro fails immediately:\n\n```\nCompilation error: Production target has 'dev' in schema name: dbt_alice_dev. Check profiles.yml.\n```\n\nFail loudly, prevent damage.\n\n**6. Permissions гate**:\n\nIf accidentally executed despite safeguards — Snowflake/BigQuery role-based access:\n- Developer credentials: NO permission to CREATE / DROP в production schemas\n- CI credentials: HAVE permission\n\nEven if dbt issues DDL, warehouse rejects due to permissions.\n\n**Best practice combo**: hardcoded prod profile + CI-only access + macro safety + CI gate + permissions. **Defense in depth**.\n\nEach layer independently catches mistake. Multiple safeguards = more reliable.

Проверьте понимание

Результат: 0 из 0
Прикладной
Вопрос 1 из 5. Team из 10 developers, no per-dev schemas: 'каждый раз кто-то перезаписывает мою customer_metrics, я теряю работу'. Что implement?

Закончили урок?

Отметьте его как пройденный, чтобы отслеживать свой прогресс

Войдите чтобы оценить урок

Прогресс модуля
0 из 5