Production-grade custom generic tests
dbt-core поставляется с 4 базовыми тестами: unique, not_null, accepted_values, relationships. Packages dbt-utils и dbt-expectations добавляют ещё 60+. Но рано или поздно встречается business-specific требование, которое нет в готовых.
Например: ‘для customers — если tier=‘premium’, то revenue не меньше $10K за последний год’. Это не generic, но повторяется в нескольких моделях. Custom generic test — ваш ответ.
В этом уроке — как писать production-grade custom generic tests: signature, config, validation, error handling, тестирование самих тестов.
Базовый синтаксис
dbt-i: первый custom generic test Data quality tests в governance-контекстеCustom generic test — это macro в tests/generic/ (или macros/ с naming convention). Имя — test_<name>.
-- tests/generic/test_no_nulls_after.sql
{% test no_nulls_after(model, column_name, after_date) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} IS NULL
AND order_date >= '{{ after_date }}'
{% endtest %}
Usage:
columns:
- name: customer_email
data_tests:
- no_nulls_after:
after_date: '2024-01-01'
Test passes if query returns 0 rows. Если есть rows — test fails (and rows are the failures).
Signature и arguments
Stand parameters:
model— текущая модель (приходит автоматически).column_name— имя колонки (если test column-level).
Custom arguments — what makes test configurable.
Single argument:
{% test value_between(model, column_name, min_value, max_value) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} < {{ min_value }} OR {{ column_name }} > {{ max_value }}
{% endtest %}
- value_between:
min_value: 0
max_value: 1000
Multiple arguments:
{% test combination_unique_excluding(model, combination_of_columns, exclude_status) %}
SELECT {{ combination_of_columns | join(', ') }}
FROM {{ model }}
WHERE status != '{{ exclude_status }}'
GROUP BY {{ combination_of_columns | join(', ') }}
HAVING COUNT(*) > 1
{% endtest %}
data_tests:
- combination_unique_excluding:
combination_of_columns: [order_id, customer_id]
exclude_status: 'cancelled'
Lists и dicts as arguments — supported.
Table-level vs column-level
Column-level test (use column_name):
{% test no_future_dates(model, column_name) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} > CURRENT_DATE
{% endtest %}
columns:
- name: order_date
data_tests:
- no_future_dates
Table-level test (no column_name):
{% test rows_count_matches(model, compare_model) %}
WITH a AS (SELECT COUNT(*) AS cnt FROM {{ model }}),
b AS (SELECT COUNT(*) AS cnt FROM {{ compare_model }})
SELECT 'mismatch' WHERE (SELECT cnt FROM a) != (SELECT cnt FROM b)
{% endtest %}
models:
- name: fct_orders
data_tests:
- rows_count_matches:
compare_model: ref('staging_orders')
Same macro structure. dbt infers level by where test is applied.
Configuration в tests
Custom tests support standard test configs: severity, where, error_if, warn_if, store_failures.
- value_between:
min_value: 0
max_value: 1000
config:
severity: warn
where: "order_date >= current_date - 30"
store_failures: true
dbt автоматически applies configs. Macro author не handles configs explicitly.
Validation в custom tests
Best practice: validate arguments at start of test.
{% test value_between(model, column_name, min_value, max_value) %}
{% if min_value is none or max_value is none %}
{{ exceptions.raise_compiler_error('value_between requires min_value and max_value') }}
{% endif %}
{% if min_value > max_value %}
{{ exceptions.raise_compiler_error('value_between: min_value must be <= max_value') }}
{% endif %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} < {{ min_value }} OR {{ column_name }} > {{ max_value }}
{% endtest %}
exceptions.raise_compiler_error stops dbt compile с явным message. Better than silent corruption.
Tests для tests (meta-testing)
Хорошие practices — test the test. Singular tests checking compile output:
-- tests/test_value_between_compiles.sql
{% set compiled = test_value_between(ref('fct_orders'), 'amount', 0, 1000) %}
{% if 'amount < 0 OR amount > 1000' not in compiled %}
{{ exceptions.raise_compiler_error('value_between did not generate expected SQL') }}
{% endif %}
SELECT 1 WHERE 1=0
Verifies test generates correct SQL.
Also: integration test — apply test to known data, verify result.
Tagging и selectors
Tests можно tag:
- value_between:
min_value: 0
max_value: 1000
config:
tags: ['critical', 'financial']
Then selective testing:
dbt test --select tag:critical
Useful для CI: fast critical tests на PR, full suite weekly.
Naming conventions
Naming patterns:
| Pattern | Examples |
|---|---|
test_<assertion> (lowercase) | test_no_nulls, test_value_between |
<package>_<assertion> | dbt_utils.unique_combination |
| Project-prefixed | myproject_business_rule_X |
Convention в команде: pick one и стандартизируйтесь. Mixed naming creates confusion.
В files: tests/generic/<name>.sql. One test per file.
Anti-patterns
1. Inverted semantic
-- BAD: returns valid rows (semantic inverted)
{% test confusing(model, column_name) %}
SELECT * FROM {{ model }} WHERE {{ column_name }} > 0
{% endtest %}
Convention: test returns rows on failure. Empty result = pass.
Fix:
-- GOOD: returns failures (where column non-positive)
{% test positive_values(model, column_name) %}
SELECT * FROM {{ model }} WHERE {{ column_name }} <= 0
{% endtest %}
2. Hard-coded values в test logic
{% test recent_data(model) %}
SELECT 1 WHERE NOT EXISTS (
SELECT 1 FROM {{ model }}
WHERE date_column >= current_date - 7
)
{% endtest %}
Days hardcoded. Different models могут have different freshness expectations. Should be parameter:
{% test recent_data(model, date_column, days) %}
SELECT 1 WHERE NOT EXISTS (
SELECT 1 FROM {{ model }}
WHERE {{ date_column }} >= current_date - {{ days }}
)
{% endtest %}
3. Tests без validation
{% test value_between(model, column_name, min_value, max_value) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} < {{ min_value }} OR {{ column_name }} > {{ max_value }}
{% endtest %}
If user calls без min_value / max_value — silent NULL in WHERE -> test always passes. Misleading.
Add validation как shown above.
4. Single mega-test
-- BAD: one test with 10 assertions
{% test mega_check(model) %}
SELECT * FROM {{ model }} WHERE
amount < 0 OR amount > 10000 OR
date > CURRENT_DATE OR customer_id IS NULL
{% endtest %}
If test fails — you don’t know which condition. Hard to debug.
Better: separate tests, each для one assertion. Composability.
Common production tests
Real examples из middle-projects:
‘No future dates’
{% test no_future_dates(model, column_name) %}
SELECT *
FROM {{ model }}
WHERE {{ column_name }} > CURRENT_DATE + INTERVAL '1 day'
{% endtest %}
Dates beyond tomorrow — likely bugs.
‘Sum reconciliation’
{% test sum_reconciliation(model, sum_column, source_model) %}
WITH a AS (SELECT SUM({{ sum_column }}) AS total FROM {{ model }}),
b AS (SELECT SUM({{ sum_column }}) AS total FROM {{ source_model }})
SELECT 'mismatch'
WHERE ABS((SELECT total FROM a) - (SELECT total FROM b)) > 0.01
{% endtest %}
Verify aggregation хранит total.
‘Business rule: high tier customers’
{% test high_tier_revenue_threshold(model, customer_id_column, amount_column, period_days, min_revenue) %}
WITH high_tier AS (
SELECT {{ customer_id_column }}
FROM {{ model }}
WHERE tier = 'premium'
),
actual_revenue AS (
SELECT customer_id, SUM({{ amount_column }}) AS revenue
FROM {{ ref('fct_orders') }}
WHERE order_date >= CURRENT_DATE - INTERVAL '{{ period_days }} days'
GROUP BY customer_id
)
SELECT h.{{ customer_id_column }}
FROM high_tier h
LEFT JOIN actual_revenue a ON h.{{ customer_id_column }} = a.customer_id
WHERE COALESCE(a.revenue, 0) < {{ min_revenue }}
{% endtest %}
Business rule encoded as test.
DuckDB specifics
DuckDB tests:
- Empty result = success. Standard.
- Type inference: DuckDB infers types from query results. Sometimes test SQL needs explicit cast.
- NULL handling: NULL != NULL. WHERE col = NULL doesn’t work. Use IS NULL.
- EXCEPT semantics: DuckDB EXCEPT strict (precision, types matter). For tolerance — wrap в ROUND или ABS.
- Performance: tests run sequentially by default. dbt 1.5+ supports parallel test execution.
Попробуй сам
В labs:
- Создайте
test_no_future_datesс column_name argument. Apply к order_date в fct_orders. - Добавьте validation: проверяйте что column_name не пустой.
- Создайте complex test
test_sum_reconciliation— sum в model == sum в source ± tolerance. - Singular test для verifying test_no_future_dates compiles correctly.
- Integration test: создайте test_fixture model с known future dates, apply test, verify failures correctly identified.
Ключевые выводы
- Custom generic tests в
tests/generic/илиmacros/. Naming:test_<assertion>. - Signature:
(model, column_name, ...custom_args). Column_name только для column-level tests. - Test fails when query returns rows. Empty result = pass.
- Validation arguments в начале macro через
exceptions.raise_compiler_error. - Configs (severity, where, store_failures) handled automatically by dbt.
- Tag tests для CI selectors (
dbt test --select tag:critical). - Meta-test: singular tests verifying custom test compile output + integration tests on fixtures.
- Naming convention в команде:
test_<assertion>илиmyproject_<assertion>. - Avoid: hardcoded values, mega-tests, inverted semantic.