dbt_project_evaluator: правила и exemptions
dbt_project_evaluator — package, который применяет 30+ best practices правил к вашему проекту. Это automated code review для dbt: DAG structure, naming, testing coverage, performance anti-patterns.
На middle-уровне это инструмент continuous quality: запускается в CI, catches regressions до merge. В этом уроке — все категории правил, как читать failures, как настраивать exemptions для legacy кода.
Data Governance: evaluator как governance-инструментSetup
# packages.yml
packages:
- package: dbt-labs/dbt_project_evaluator
version: [">=0.13.0", "<1.0.0"]
dbt deps
dbt build --select package:dbt_project_evaluator
Этот build создаёт models с результатами проверок:
fct_marts_or_intermediate_dependent_on_source— marts shouldn’t ref sources directly.fct_unused_sources— sources, не используемые ни одной model.fct_undocumented_models— модели без description.- …и десятки других.
Каждая model — list of violations (rows). Если model empty — все ok. Если есть rows — violations.
Категории правил
1. DAG / Modeling
Правила структуры dependency graph.
fct_direct_join_to_source— модель не staging делает ref/source без staging слоя.fct_multiple_sources_joined— staging model joins multiple sources (должна быть один source).fct_root_models— models без upstream (могут быть orphan).fct_rejoining_of_upstream_concepts— A -> B и A -> C, потом B/C joined в D (redundant DAG path).fct_model_fanout— model имеет 5+ downstream (high blast radius).fct_chained_views_dependencies— view -> view -> view (cascading recompute).
-- Run проверки:
dbt build --select package:dbt_project_evaluator --target dev
-- Show violations:
SELECT * FROM dbt_project_evaluator.fct_direct_join_to_source;
SELECT * FROM dbt_project_evaluator.fct_multiple_sources_joined;
2. Testing
Правила coverage тестов.
fct_test_coverage— percentage models с tests. Critical для compliance.fct_missing_primary_key_tests— модели без unique / primary key test.fct_documentation_coverage— model description coverage.
SELECT * FROM dbt_project_evaluator.fct_test_coverage
WHERE percent_coverage < 0.8; -- модели с < 80% column coverage
3. Documentation
fct_undocumented_models— models без description.fct_undocumented_source_tables— sources без description.fct_columns_without_descriptions— columns без descriptions.
4. Structure / Naming
fct_model_naming_conventions— checks fct_, dim_, stg_, int_ prefixes.fct_source_directories— sources не в правильной директории.fct_staging_directories— staging не в models/staging.fct_model_directories— fct/dim не в models/marts.
5. Performance
fct_exposure_parents_materializations— exposure parents не table materialized (view = slow).fct_chained_views_dependencies— view chains.
Reading failures
После build, проверки доступны как regular models. Можно queries:
-- Все undocumented models:
SELECT resource_name, resource_type
FROM dbt_project_evaluator.fct_undocumented_models;
-- Models с rejoining (redundant DAG):
SELECT
model_name,
rejoined_concepts
FROM dbt_project_evaluator.fct_rejoining_of_upstream_concepts;
-- Models без PK test:
SELECT model_name
FROM dbt_project_evaluator.fct_missing_primary_key_tests;
Each model documented в dbt_project_evaluator package docs — what it checks, why important.
Summary view
-- Summary всех violations:
SELECT
'undocumented_models' AS check_name,
COUNT(*) AS violations
FROM dbt_project_evaluator.fct_undocumented_models
UNION ALL
SELECT 'missing_pk_tests', COUNT(*) FROM dbt_project_evaluator.fct_missing_primary_key_tests
UNION ALL
SELECT 'direct_join_to_source', COUNT(*) FROM dbt_project_evaluator.fct_direct_join_to_source
ORDER BY violations DESC
CI integration
В CI запускайте после dbt build:
# .github/workflows/dbt-quality.yml
- name: Run project evaluator
run: dbt build --select package:dbt_project_evaluator
- name: Check critical violations
run: |
dbt show --select fct_undocumented_models --inline "
SELECT COUNT(*) FROM dbt_project_evaluator.fct_undocumented_models
" | awk 'NR==4 && $1 > 0 {exit 1}'
Это fails CI если есть undocumented models. Кастомизируйте per project — какие violations block vs warn.
Альтернатива: использовать dbt tests, которые fail при violation:
# models/_evaluator_tests.yml
models:
- name: fct_undocumented_models
data_tests:
- dbt_utils.expression_is_true:
expression: "COUNT(*) = 0"
config:
severity: error
warn_if: ">0"
error_if: ">10"
CI runs dbt test — failures блокируют merge.
Exemptions для legacy
Реальный сценарий: проект 2 года, evaluator finds 200 violations. Из них 50 real issues, 150 — legacy, которые fix-ить не приоритет.
Решение — exemptions config в dbt_project.yml:
vars:
dbt_project_evaluator:
# Список моделей, освобождённых от specific checks
# Exempt из undocumented check
undocumented_models_exclude:
- 'fct_legacy_orders'
- 'dim_legacy_customers'
# Exempt из missing_pk_tests
missing_primary_key_tests_exclude:
- 'fct_audit_log' # audit log doesn't need PK
# Exempt из direct_join_to_source
direct_join_to_source_exclude:
- 'fct_legacy_*' # wildcards возможны
Documented exemptions:
# dbt_project.yml
vars:
dbt_project_evaluator:
undocumented_models_exclude:
# Legacy models pre-2024. Documentation backfill — Q3 2026.
- 'fct_legacy_orders'
- 'dim_legacy_customers'
# Internal helper — no business meaning, no docs needed.
- 'int_debug_temp'
Comments объясняют why. Иначе future devs не понимают legacy carve-outs.
Custom checks
Можно добавить custom evaluator checks. Создайте model в models/data_quality/:
-- models/data_quality/fct_models_without_grants.sql
{{ config(
materialized='table',
tags=['evaluator', 'data_quality']
) }}
SELECT name AS model_name
FROM {{ ref('all_models') }} -- evaluator helper model
WHERE configurations NOT LIKE '%grants%'
AND model_type = 'table' -- only tables, не views
Этот check finds tables без +grants config. После build:
SELECT * FROM analytics.fct_models_without_grants;
И в CI:
data_tests:
- dbt_utils.expression_is_true:
expression: "(SELECT COUNT(*) FROM analytics.fct_models_without_grants) = 0"
Extending evaluator с project-specific rules.
Production rollout
Большой проект, evaluator добавляется впервые. 200 violations.
Phase 1 — Assessment (Week 1):
dbt build --select package:dbt_project_evaluator
Document each violation type, count:
SELECT * FROM analytics.fct_evaluator_summary;
Categorize:
- Critical (block CI): missing PK tests на fact tables, undocumented marts.
- High priority (fix in 1-2 sprints): documentation gaps, direct source joins.
- Low priority (backlog): naming conventions, rejoining concepts.
- Won’t fix: legacy exemptions.
Phase 2 — Setup (Week 2):
# dbt_project.yml
vars:
dbt_project_evaluator:
# Initial exemptions: всё что не critical
undocumented_models_exclude:
- 'fct_legacy_*'
missing_primary_key_tests_exclude:
- 'audit_*'
CI tests с warn для critical, info для остальное.
Phase 3 — Fix critical (Weeks 3-4):
Address все critical violations. Update CI to error на these.
Phase 4 — Backlog work (ongoing):
Каждый sprint — fix 5-10 high priority. Reduce exemptions list. Remove legacy exemptions when fixed.
Phase 5 — New code prevention:
CI configured to block new violations. Existing exemptions remain. Goal: net new code 100% compliant.
После года: exemptions list — short, well-documented. Project quality measurably better.
Common violations patterns
”direct_join_to_source”
-- BAD: fct_orders.sql
SELECT * FROM {{ source('app', 'orders') }} -- source без staging слоя
Fix: создать staging слой:
-- models/staging/stg_app__orders.sql — new
SELECT * FROM {{ source('app', 'orders') }}
-- models/marts/fct_orders.sql — updated
SELECT * FROM {{ ref('stg_app__orders') }}
Bénéfit: testing, documentation, изоляция source schema changes.
”missing_primary_key_tests”
# BEFORE: fct_orders.yml без PK test
models:
- name: fct_orders
columns:
- name: order_id
Fix:
models:
- name: fct_orders
columns:
- name: order_id
data_tests:
- unique
- not_null
Single column PK. For composite:
data_tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [order_id, line_item_id]
“rejoining_of_upstream_concepts”
stg_customers ─┬──> dim_customers
└──> int_customer_stats ──> fct_customer_metrics
stg_customers дважды consumed: once для dim, once через int_customer_stats который join’ится с dim. Redundancy.
Fix: dim_customers должен быть единственный consumer stg_customers downstream of dim:
stg_customers ──> dim_customers ──> int_customer_stats ──> fct_customer_metrics
Linear chain. Cleaner DAG.
”chained_views_dependencies”
stg_orders (view) ──> int_enriched (view) ──> fct_orders (view)
3-level view chain. Каждый SELECT в downstream — recompute всех views. Performance death.
Fix: materialize middle layer as table:
-- models/intermediate/int_enriched.sql
{{ config(materialized='table') }} -- было view
SELECT ...
Compute happens once на dbt run. Downstream queries fast.
DuckDB specifics
dbt_project_evaluator работает на DuckDB. Особенности:
- Performance: evaluator builds 30+ models. На большом проекте — 5-15 minutes.
- Memory: anali graf проекта — нужно memory_limit setting.
- Schemas: evaluator создаёт schema dbt_project_evaluator. На DuckDB single-database — этим занят отдельный namespace.
Альтернативы
dbt-checkpoint (модуль 12) — pre-commit hooks для quality checks. Эквивалентно evaluator, но runs on PR vs after build.
sqlfluff (модуль 12) — SQL linting, не dbt-specific.
evaluator + sqlfluff + dbt-checkpoint — full quality stack. evaluator для DAG / structural, sqlfluff для SQL syntax, dbt-checkpoint для simple convention checks.
Попробуй сам
В labs:
- Install dbt_project_evaluator, run
dbt build --select package:dbt_project_evaluator. - Identify violations через queries:
SELECT * FROM dbt_project_evaluator.fct_undocumented_models; SELECT * FROM dbt_project_evaluator.fct_missing_primary_key_tests; - Fix one violation per category:
- Add description to undocumented model.
- Add PK test to model.
- Add staging layer для model с direct source join.
- Setup exemptions для legacy models:
vars: dbt_project_evaluator: undocumented_models_exclude: ['legacy_old_model'] - CI integration: add
dbt teststep that fails on critical violations.
Ключевые выводы
- dbt_project_evaluator — automated dbt code review. 30+ best practices правил.
- Категории: DAG modeling, testing, documentation, structure/naming, performance.
- Output — models с violations. Empty = ok, rows = issues.
- CI integration: dbt test с expression_is_true, или CLI checks output.
- Exemptions в dbt_project.yml для legacy. Document why в comments.
- Custom checks через models в проекте — extends evaluator с project-specific rules.
- Rollout phases: assess -> setup -> fix critical -> backlog -> prevent new.
- Combination с sqlfluff + dbt-checkpoint для full quality stack.