Defining contract: columns, data_type, constraints (PK/FK/CHECK/NOT NULL)
В прошлом уроке мы разобрали концепт contract. Этот урок — про синтаксис: точно как объявлять colonki, типы, constraints, и что supports каждый warehouse.
Production-grade contract = full declaration: каждая колонка с типом, PK/FK/CHECK/NOT NULL constraints где meaningful. Это схема как код для downstream consumers.
SQL DDL: constraints — PK/FK/CHECK/NOT NULL на уровне базыМинимальный contract: только columns + types
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: customer_id
data_type: bigint
- name: first_name
data_type: varchar(100)
- name: last_name
data_type: varchar(100)
- name: email
data_type: varchar(255)
- name: created_at
data_type: timestamp
SQL должен материализовать колонки точно с этими типами. Casts в SQL обязательны:
SELECT
id::bigint AS customer_id,
first_name::varchar(100),
last_name::varchar(100),
email::varchar(255),
created_at::timestamp
FROM {{ ref('stg_customers') }}
Это base level — schema enforced, без constraints.
data_type: какие типы supports
dbt принимает SQL data types, специфичные для каждого warehouse. Common типы:
| Тип семантика | DuckDB | Snowflake | BigQuery | Postgres |
|---|---|---|---|---|
| Integer 64-bit | bigint | bigint / number(19,0) | int64 | bigint |
| Integer 32-bit | integer | integer / number(10,0) | int64 (no 32-bit) | integer |
| Decimal (точный) | numeric(p, s) | number(p, s) | numeric / bignumeric | numeric(p, s) |
| Float double | double | float | float64 | double precision |
| Variable text | varchar(N) | varchar(N) | string | varchar(N) |
| Boolean | boolean | boolean | bool | boolean |
| Date | date | date | date | date |
| Timestamp | timestamp | timestamp_ntz | timestamp | timestamp |
| Timestamp + TZ | timestamptz | timestamp_tz | timestamp | timestamp with time zone |
| Array | bigint[] | array | array<int64> | bigint[] |
| JSON / Struct | json / struct(...) | variant / object | json / struct<> | jsonb |
Tip: на DuckDB основные типы pretty standard SQL. На Snowflake/BigQuery — некоторые отличия (например, BigQuery only int64, no smaller integers).
Точная precision для decimal
Для финансов обязательно указывать precision/scale:
- name: order_total
data_type: numeric(12, 2)
# 12 digits total, 2 after decimal point
# max value: 9_999_999_999.99
numeric без precision — драйфует. Compatibility issues между warehouses + потеря точности при aggregation.
При нарушении в SQL:
-- SQL
SELECT order_total::numeric(10, 4) AS order_total
Contract enforcement failure:
- Column 'order_total': SQL type is numeric(10, 4), YAML declares numeric(12, 2)
Это caught — alignment строгий.
Constraints: PRIMARY KEY
- name: customer_id
data_type: bigint
constraints:
- type: primary_key
или (multi-column PK):
columns:
- name: order_id
data_type: bigint
- name: line_item_id
data_type: bigint
# на уровне модели — composite key
constraints:
- type: primary_key
columns: [order_id, line_item_id]
Что происходит при materialization:
- DuckDB:
CREATE TABLE ... PRIMARY KEY (customer_id)— enforced при insert - Snowflake:
PRIMARY KEY (...)declared — но NOT enforced (warehouse design) - BigQuery: PK declared в metadata, optimizer hint — не enforced
- Postgres: enforced
Production реальность: на cloud warehouses (Snowflake/BigQuery) PK declarative. Не полагайся на enforcement — дублируй через data test unique для real check.
- name: customer_id
data_type: bigint
constraints:
- type: primary_key # declarative на Snowflake
data_tests:
- unique # actual enforcement через test
- not_null
Это belt and suspenders approach — оба слоя для critical FK.
Constraints: FOREIGN KEY
- name: customer_id
data_type: bigint
constraints:
- type: foreign_key
expression: ref('dim_customers')
references_field: customer_id
(синтаксис может слегка отличаться в разных версиях dbt — проверять docs your version)
Support:
- DuckDB locally: enforced ok
- DuckDB MotherDuck: NOT supported
- Snowflake: declarative, не enforced
- BigQuery: declarative, не enforced
- Postgres: enforced
Production: используй contract FK для documentation/optimizer + data test relationships для actual check:
- name: customer_id
constraints:
- type: foreign_key
expression: ref('dim_customers')
references_field: customer_id
data_tests:
- relationships:
to: ref('dim_customers')
field: customer_id
Constraints: CHECK
Custom validation expressions:
- name: order_total
data_type: numeric(12, 2)
constraints:
- type: check
expression: "order_total >= 0"
- name: status
data_type: varchar(20)
constraints:
- type: check
expression: "status IN ('completed', 'pending', 'cancelled', 'refunded')"
- name: email
data_type: varchar(255)
constraints:
- type: check
expression: "email LIKE '%@%'"
Support:
- DuckDB: enforced
- Postgres: enforced
- Snowflake: declarative, не enforced
- BigQuery: declarative
Для critical CHECK логики используй data test через dbt_utils.expression_is_true:
- name: order_total
data_tests:
- dbt_utils.expression_is_true:
expression: ">= 0"
Это работает в любом warehouse как actual check.
Constraints: NOT NULL
- name: order_id
data_type: bigint
constraints:
- type: not_null
Support:
- DuckDB: enforced
- Postgres: enforced
- Snowflake: enforced (одно из немногих enforced constraints в Snowflake)
- BigQuery: enforced through STRUCT mode=‘REQUIRED’
NOT NULL — один из best-enforced constraints across warehouses. Большая часть warehouses фактически рекомендуют это для primary keys и timestamp columns.
Production:
- name: order_id
data_type: bigint
constraints:
- type: not_null
- type: primary_key
data_tests:
- not_null # дополнительная safety
- unique
Constraints: UNIQUE
- name: email
data_type: varchar(255)
constraints:
- type: unique
Support:
- DuckDB: enforced via UNIQUE index
- Postgres: enforced
- Snowflake: declarative
- BigQuery: не supported в DDL
Используй data_tests: [unique] для cross-warehouse.
Full production contract пример
- name: fct_orders
description: "Production fact for orders. PII-clean, used by 15 dashboards."
config:
contract:
enforced: true
materialized: table
columns:
# Primary key
- name: order_id
data_type: bigint
description: "Unique order ID"
constraints:
- type: not_null
- type: primary_key
data_tests: [unique, not_null]
# Foreign key
- name: customer_id
data_type: bigint
description: "FK on dim_customers"
constraints:
- type: not_null
- type: foreign_key
expression: ref('dim_customers')
references_field: customer_id
data_tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
# Date
- name: order_date
data_type: date
description: "Date of order (UTC)"
constraints:
- type: not_null
- type: check
expression: "order_date >= DATE '2000-01-01' AND order_date <= CURRENT_DATE + INTERVAL '1 year'"
data_tests: [not_null]
# Financial decimal
- name: order_total
data_type: numeric(12, 2)
description: "Order total in USD"
constraints:
- type: not_null
- type: check
expression: "order_total >= 0"
data_tests:
- not_null
- dbt_utils.expression_is_true:
expression: ">= 0"
# Categorical
- name: status
data_type: varchar(20)
description: "One of: completed, pending, cancelled, refunded"
constraints:
- type: not_null
- type: check
expression: "status IN ('completed', 'pending', 'cancelled', 'refunded')"
data_tests:
- not_null
- accepted_values:
values: ['completed', 'pending', 'cancelled', 'refunded']
# Optional text
- name: notes
data_type: varchar(500)
description: "Customer notes (optional)"
# no constraints, can be NULL
-- models/marts/fct_orders.sql
SELECT
o.order_id::bigint,
o.customer_id::bigint,
o.order_date::date,
o.order_total::numeric(12, 2),
o.status::varchar(20),
o.notes::varchar(500)
FROM {{ ref('stg_orders') }} o
WHERE o.order_id IS NOT NULL
AND o.customer_id IS NOT NULL
Когда dbt run:
- SQL компилируется
- Resulting types vs YAML declared — checked
- DDL constraints applied (where supports)
- Material runs
- Data tests run после material
При нарушении — error message:
Contract enforcement failure for fct_orders:
- Column 'order_id': SQL produces 'integer', YAML declares 'bigint'
- Column 'status': SQL produces 'varchar', YAML declares 'varchar(20)'
- Unexpected column 'customer_email' in SQL not declared in YAML
Producer обязан align.
Constraints на model level vs column level
Иногда constraint касается нескольких колонок — composite PK, composite UNIQUE, complex CHECK:
- name: order_line_items
config:
contract:
enforced: true
constraints: # на уровне модели
- type: primary_key
columns: [order_id, line_number]
- type: unique
columns: [order_id, line_number]
- type: check
expression: "quantity > 0"
name: positive_quantity
columns:
- name: order_id
data_type: bigint
constraints: [{type: not_null}] # column-level
- name: line_number
data_type: integer
constraints: [{type: not_null}]
- name: quantity
data_type: integer
Comp PK / composite UNIQUE объявляются на model level через columns array. CHECK на multiple columns — также model level.
Type alignment gotchas
1. Implicit casts на DuckDB
DuckDB liberal с casting. SELECT 1 AS x создаст x INTEGER, но contract x BIGINT — fails. Always explicit:
SELECT 1::bigint AS x
2. Functions с unclear return types
-- Что возвращает? Зависит от input
SELECT SUM(order_total) AS revenue
FROM orders
Может вернуть BIGINT, NUMERIC, HUGEINT — зависит от input column type. Phantom drift.
Решение — explicit cast:
SELECT SUM(order_total)::numeric(12, 2) AS revenue
3. Date / timestamp granularity
data_type: timestamp # vs timestamp(0)
DuckDB timestamp = microsecond precision. PostgreSQL — same. Если SQL делает now()::timestamp — works. Но если now()::timestamptz — fails contract.
4. Varchar(N) vs varchar
data_type: varchar # без N
vs
data_type: varchar(100)
Some warehouses allow varchar без length, others require. Snowflake — varchar = varchar(16777216) (max). DuckDB — flexible.
Production: always specify length explicitly. Catches truncation bugs.
Migration: добавление contract на existing model
Phased approach (см. урок 01):
# PR 1: declare без enforce
- name: fct_orders
config:
contract:
enforced: false # declarations only
columns:
- name: order_id
data_type: bigint
...
# PR 2: align SQL — explicit casts, remove extras
# (SQL changes)
# PR 3: enable enforce
- name: fct_orders
config:
contract:
enforced: true
columns:
...
Каждый PR — separate review. После PR 3 — pre-existing model under contract.
Антипаттерны
-
Generic numeric без precision:
data_type: numeric— drift между warehouses, precision loss. Alwaysnumeric(p, s). -
No length для varchar:
varcharбез N — truncation bugs могут проходить silently. Alwaysvarchar(N). -
FK через contract вместо relationships test: на Snowflake/BigQuery FK declarative — не enforced. Polagis on data test для real check.
-
CHECK через contract вместо expression test: на Snowflake/BigQuery CHECK declarative. Data test
expression_is_trueworks cross-warehouse. -
Lots of CHECKs: 10 CHECK constraints на колонке — fragile при изменениях. Группируй где возможно.
-
Implicit casts в SQL: полагание на DuckDB liberal casting — breaks при migration to Snowflake. Always explicit
::typefor contract columns. -
No data tests + contract only: contract = schema. Data quality нужны через data tests. Не дублирование, а complementary.
Попробуй сам
-
Выбрать mart-модель с 5-10 колонками.
-
Добавить contract declarations в YAML:
- data_type для каждой
- PK на ID column
- FK на FK columns
- CHECK на critical fields (revenue не меньше 0)
- NOT NULL на required
-
Align SQL — explicit casts.
-
Enable enforce:
contract.enforced: true. -
Test breaks:
- Изменить тип в SQL — должен fail
- Add extra column в SQL — should fail
- Remove column from YAML — should fail
- Negative revenue (если CHECK) — DuckDB blocks (Snowflake — passes silently)
-
Test data tests — отдельный layer, проверяют data quality после material.
Ключевые выводы
- Полный contract = data_type + constraints. Декларация в YAML, alignment в SQL через explicit casts.
- Constraint types:
not_null,primary_key,foreign_key,check,unique. Каждый с разной enforcement по warehouses. - Enforcement matrix: DuckDB locally — most enforced. Snowflake/BigQuery — mostly metadata-only. Postgres — most enforced. Production = constraint + data test для real check.
- DuckDB FK partial: works locally, не on MotherDuck. Snowflake/BigQuery FK — only declarative.
- Always specify precision:
numeric(12, 2),varchar(100), не barenumeric/varchar. - Type alignment: explicit
::typecasts в SQL для каждой declared column. DuckDB liberal casting — fails при migration. - Composite constraints — на model level через
columns: [...]array. - Migration: declare -> align SQL -> enable enforce. Phased для safety.