generate_schema_name override: per-developer и clean prod schemas
В большом проекте с многими developers — каждый запускает dbt run локально. Если все пишут в один shared dev schema, они наступают на ноги друг другу: Alice runs customer_metrics, Bob runs параллельно — конфликт, таблица перезаписывается.
Решение — per-developer schemas в dev: dbt_alice_dev, dbt_bob_dev. Каждый разработчик имеет свой namespace. В prod — clean schemas без префиксов: marts, staging.
dbt позволяет настроить это через override generate_schema_name macro. Этот урок — как переопределить, какие patterns common, и как наводить порядок в multi-developer environment.
Default behavior без override
dbt по умолчанию имеет:
schema = target.schema # если YAML +schema не set
schema = target.schema + '_' + custom_schema # если +schema: 'custom'
Например, target.schema = 'main':
- Model без +schema ->
main - Model с
+schema: marts->main_marts
Это append pattern. На prod часто это нежелательно — хочется marts чисто.
И для dev — все developers пишут в main_marts, конфликты.
Override syntax
В macros/get_custom_schema.sql (имя файла любое):
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if target.name == 'prod' -%}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
{{ custom_schema_name | trim }}
{%- endif -%}
{%- elif target.name == 'dev' -%}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
{{ default_schema }}_{{ custom_schema_name | trim }}
{%- endif -%}
{%- else -%}
{{ default_schema }}_{{ custom_schema_name | trim }}
{%- endif -%}
{%- endmacro %}
Что делает:
- В prod:
+schema: marts-> schema =marts(clean, no prefix) - В dev:
+schema: marts-> schema =dbt_alice_dev_marts(with prefix) - В other targets (ci): same dev-like behavior
dbt автоматически использует этот override при resolve schema names.
Per-developer schema через env_var
# profiles.yml
my_project:
outputs:
dev:
type: duckdb
path: './dev.duckdb'
schema: "dbt_{{ env_var('USER', 'unknown') }}_dev" # uses OS USER
threads: 4
Alice locally:
$USER = alice
target.schema = 'dbt_alice_dev'
Bob locally:
$USER = bob
target.schema = 'dbt_bob_dev'
Через generate_schema_name override -> models go в dbt_alice_dev_marts, dbt_bob_dev_marts, etc. Isolated namespaces — no conflicts.
Полный пример: production patterns
Profile
# profiles.yml
my_project:
target: dev
outputs:
dev:
type: duckdb
path: './dev_{{ env_var(\"USER\", \"unknown\") }}.duckdb'
schema: "dbt_{{ env_var('USER', 'unknown') }}_dev"
threads: 4
ci:
type: duckdb
path: ':memory:'
schema: 'ci'
threads: 4
prod:
type: snowflake
account: "{{ env_var('SF_ACCOUNT') }}"
user: "{{ env_var('SF_USER') }}"
password: "{{ env_var('SF_PASSWORD') }}"
database: 'ANALYTICS_PROD'
schema: 'transformed' # base schema в prod
warehouse: 'PROD_WH'
threads: 16
dbt_project.yml
name: my_project
models:
my_project:
+schema: '' # default empty
staging:
+schema: staging
intermediate:
+schema: intermediate
marts:
+schema: marts
marts:
finance:
+schema: marts_finance
marketing:
+schema: marts_marketing
Override macro
-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if target.name == 'prod' -%}
{# Production: custom_schema_name перекрывает default. Clean names. #}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
{{ custom_schema_name | trim }}
{%- endif -%}
{%- else -%}
{# Dev / CI: default + custom = dbt_alice_dev_marts #}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
{{ default_schema }}_{{ custom_schema_name | trim }}
{%- endif -%}
{%- endif -%}
{%- endmacro %}
Результат
Production (target.schema='transformed', target.name='prod'):
| Model layer | Schema in warehouse |
|---|---|
| staging.users | staging |
| intermediate.user_metrics | intermediate |
| marts.fct_orders | marts |
| marts.finance.revenue | marts_finance |
| marts.marketing.campaigns | marts_marketing |
Clean — marts, не transformed_marts.
Dev for Alice (target.schema='dbt_alice_dev', target.name='dev'):
| Model | Schema |
|---|---|
| staging.users | dbt_alice_dev_staging |
| marts.fct_orders | dbt_alice_dev_marts |
| marts.finance.revenue | dbt_alice_dev_marts_finance |
Isolated — никаких конфликтов с Bob.
CI (target.schema='ci', target.name='ci'):
| Model | Schema |
|---|---|
| staging.users | ci_staging |
| marts.fct_orders | ci_marts |
Isolated CI environment.
Common patterns
Pattern 1: per-developer + per-feature branch
Для team с feature branches:
schema: "dbt_{{ env_var('USER', 'unknown') }}_{{ env_var('GIT_BRANCH', 'main') | replace('/', '_') }}"
Alice on feature/new-revenue branch:
$USER = alice
$GIT_BRANCH = feature/new-revenue
target.schema = dbt_alice_feature_new_revenue
Каждая feature branch isolated. Может test без conflicts.
Pattern 2: PR-based isolation (для team с frequent PRs)
schema: "dbt_pr_{{ env_var('PR_NUMBER', 'unknown') }}"
PR #123: schema = dbt_pr_123. После merge — cleanup.
Pattern 3: time-based isolation (для experimentation)
schema: "dbt_{{ env_var('USER') }}_{{ env_var('EXPERIMENT_ID', run_started_at.timestamp() | int) }}"
Each run — different schema. Useful for A/B testing logic.
Pattern 4: shared dev schema (small team)
schema: "dbt_dev" # shared everyone
For 2-3 developers, simple. Risk: conflicts при simultaneous runs. Mitigate через —select to prevent overlap.
Cleanup dev schemas
Per-developer schemas accumulate junk over time. Periodic cleanup:
-- macros/cleanup_dev_schemas.sql
{% macro cleanup_dev_schemas() %}
{% if target.name != 'dev' %}
{{ exceptions.raise_compiler_error('cleanup_dev_schemas can only run in dev') }}
{% endif %}
{% set schemas_query %}
SELECT schema_name FROM information_schema.schemata
WHERE schema_name LIKE 'dbt_%_dev%'
{% endset %}
{% set results = run_query(schemas_query) %}
{% if execute %}
{% for schema in results.columns[0].values() %}
{{ log('Dropping ' ~ schema, info=True) }}
{% do run_query('DROP SCHEMA IF EXISTS ' ~ schema ~ ' CASCADE') %}
{% endfor %}
{% endif %}
{% endmacro %}
Run:
dbt run-operation cleanup_dev_schemas
Cleanup all dev schemas. Use carefully — может wipe own data.
Better: scheduled cleanup of OLD dev schemas (untouched >30 days):
{% macro cleanup_old_dev_schemas() %}
{% set old_schemas_query %}
SELECT s.schema_name
FROM information_schema.schemata s
LEFT JOIN (
SELECT table_schema, MAX(last_modified) AS last_used
FROM information_schema.tables
GROUP BY table_schema
) t ON t.table_schema = s.schema_name
WHERE s.schema_name LIKE 'dbt_%_dev%'
AND t.last_used < CURRENT_DATE - INTERVAL '30 day'
{% endset %}
{# ... drop logic #}
{% endmacro %}
Scheduled run раз в неделю — keeps warehouse clean.
Grants и schemas
Per-developer schemas needs grants. Через generate_schema_name + grants config:
# dbt_project.yml
models:
my_project:
marts:
+grants:
select: "{{ ['analyst_role'] if target.name == 'prod' else [] }}"
In dev — no grants (private schema). In prod — analysts can SELECT.
This pairs with grants config (next lesson).
Anti-patterns
1. Hardcoded dev schemas в models
- name: customer_metrics
config:
schema: dbt_alice_dev_marts # specific to Alice!
Hardcoded на developer. Catastrophic.
Use +schema: marts + generate_schema_name override.
2. No isolation в dev
dev:
schema: dev_marts # everyone shares
Race conditions. Conflicts. Lost work.
Per-developer (env_var USER) — solution.
3. Production with developer prefixes
prod:
schema: dbt_prod_alice_marts # leaked developer name to production!
Production schemas — clean. No personal info.
Override macro handles this — dev appends, prod doesn’t.
4. Forgot to override
Without override, dbt uses default behavior:
target.schema = 'main'+schema: marts->main_marts
Want marts in prod. Override macro is the only way.
5. Different developers, different naming
Alice’s profile: schema: alice_dev
Bob’s profile: schema: dbt_bob_dev
Inconsistent. Code review confusing. Standardize via env_var or team profile.
6. CI uses developer pattern
ci:
schema: "dbt_{{ env_var('USER', 'ci') }}_ci"
CI doesn’t have USER. Fallback to ‘ci’ works, but inconsistent с production pattern.
Cleaner:
ci:
schema: 'ci'
Plain. No override needed for CI namespace.
CI: testing schema generation
# scripts/check_schema_generation.py
import json
with open('target/manifest.json') as f:
manifest = json.load(f)
# All models should have correct schemas based on env
for node in manifest['nodes'].values():
if node['resource_type'] != 'model':
continue
schema = node['schema']
# Production check: schema should NOT contain 'dev', 'ci', or developer names
if 'dev' in schema or 'ci' in schema:
print(f'ERROR: {node[\"name\"]} schema {schema} contains env name')
Useful in pre-prod deploy validation — caught schemas before pushing to prod.
Production scenario: 10 developers, 100 models
Setup:
- 10 developers, each works locally
- 100 models in repo
- Production schema:
marts,staging, etc.
Without per-dev schemas:
- All 10 developers write к
dev_marts - 5 of them run customer_metrics simultaneously -> race conditions
- 1 of them runs —full-refresh -> wipes others’ work
- Endless conflicts
With per-dev schemas:
- Alice:
dbt_alice_dev_marts - Bob:
dbt_bob_dev_marts - … etc
- Each isolated. No conflicts. Parallel work.
Cost: 10x storage (10 schemas × 100 models). Acceptable for DuckDB (file storage) или Snowflake (compute pays). For DataPond / lake — separate concern.
After PR merge -> CI runs against clean ci_marts. Production runs against marts. Clean separation.
Попробуй сам
-
Set up per-dev schema:
# profiles.yml dev: schema: "dbt_`{{ env_var('USER', 'unknown') }}`_dev" -
Add override macro:
-- macros/generate_schema_name.sql `{% macro generate_schema_name(custom_schema_name, node) %}` `{% if target.name == 'prod' and custom_schema_name %}` `{{ custom_schema_name }}` `{% elif custom_schema_name %}` `{{ target.schema }}`_`{{ custom_schema_name }}` `{% else %}` `{{ target.schema }}` `{% endif %}` `{% endmacro %}` -
Set per-model schemas:
models: my_project: marts: +schema: marts staging: +schema: staging -
Test:
dbt run --target dev # Schema: `dbt_<USER>_dev_marts` dbt run --target prod # Schema: marts -
Multi-developer test:
- Simulate 2 developers (
USER=aliceandUSER=bob) dbt runeach — verify separate schemas- No conflicts
- Simulate 2 developers (
-
CI: ensure CI uses
cischema explicitly.
Ключевые выводы
- generate_schema_name macro determines schema names per model. Default behavior — append
+schematotarget.schema. Override for cleaner naming. - Per-developer schemas в dev — isolate developers, no conflicts. Pattern:
dbt_{USER}_dev. - Clean prod schemas —
marts, nottransformed_marts. Override macro returnscustom_schema_namedirectly in prod. - Standard override: dev appends prefix, prod uses bare custom_schema_name. Other targets fallback to dev-like behavior.
- Patterns: per-feature-branch (multi-feature dev), PR-based (CI per PR), time-based (experimentation), shared dev (small team).
- Cleanup: periodic cleanup old dev schemas (untouched >30 days). Schedule в scheduled runs.
- Anti-patterns: hardcoded dev schemas в models, no isolation (shared dev), production with developer prefixes, inconsistent naming across developers.
- CI gate: validate prod schemas don’t contain ‘dev’/‘ci’/developer names. Catch errors pre-deploy.