Relation API: Relation.create() и quoting policy

Relation — это представление таблицы / view в warehouse на уровне Python. Когда dbt-core говорит «модель A зависит от модели B», в коде это Relation-объект для A и Relation-объект для B. Quoting, case-sensitivity, schema isolation — всё это политики, привязанные к Relation.

__slots__ и @dataclass: память и удобство

В этом уроке — полная анатомия Relation API: какие поля имеет, как создавать через Relation.create(), как настраивать quoting policy под warehouse.

Что такое Relation

Relation — это 3-part identifier: (database, schema, identifier). Каждая таблица или view имеет такое имя. Например:

analytics.marts.fct_orders
my_warehouse.public.users
prod_db.snapshots.customer_snapshot

В коде dbt:

from dbt.adapters.base import BaseRelation

rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    type='table',
)

print(rel)            # analytics.marts.fct_orders
print(rel.database)   # analytics
print(rel.schema)     # marts
print(rel.identifier) # fct_orders
print(rel.type)       # table

При rendering в SQL — Relation становится правильно-quoted identifier:

print(f'SELECT * FROM {rel}')
# SELECT * FROM "analytics"."marts"."fct_orders"
# (или analytics.marts.fct_orders — depends на quoting policy)

Структура BaseRelation

# dbt-adapters/dbt/adapters/base/relation.py (упрощённо)
@dataclass(frozen=True)
class BaseRelation:
    path: Path                          # Path(database, schema, identifier)
    type: Optional[RelationType] = None # 'table', 'view', 'cte', 'external'
    
    quote_character: str = '"'
    include_policy: Policy = field(default_factory=Policy)
    quote_policy: Policy = field(default_factory=Policy)
    dbt_created: bool = False
    metadata: Optional[Dict] = None

Полей:

path — namedtuple (database, schema, identifier).
type — RelationType: table, view, cte, external, materialized_view.
quote_character — символ для quoting. Default " (ANSI). MySQL использует backtick `.
include_policy — какие parts включать в rendered name (database/schema/identifier).
quote_policy — какие parts quoting.
dbt_created — boolean. True если dbt создал этот relation (в текущем run или раньше).

include_policy: что включать в name

Policy — namedtuple of booleans для каждого component:

@dataclass
class Policy:
    database: bool = True
    schema: bool = True
    identifier: bool = True

include_policy определяет, какие parts появляются в rendered Relation:

rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    include_policy=Policy(database=True, schema=True, identifier=True),
)
print(rel)
# analytics.marts.fct_orders


rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    include_policy=Policy(database=False, schema=True, identifier=True),
)
print(rel)
# marts.fct_orders  ← database не включён


rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    include_policy=Policy(database=False, schema=False, identifier=True),
)
print(rel)
# fct_orders  ← только identifier

Use case: некоторые warehouses не используют database tier. Например, MySQL только schema (DB) + table. BigQuery наоборот — DB (project) + dataset + table, все три обязательны.

Defaults differ between adapters:

# dbt-bigquery (project.dataset.table)
class BigQueryRelation(BaseRelation):
    include_policy: Policy = field(
        default_factory=lambda: Policy(database=True, schema=True, identifier=True)
    )

# dbt-mysql (no database tier, just schema.table)
class MySQLRelation(BaseRelation):
    include_policy: Policy = field(
        default_factory=lambda: Policy(database=False, schema=True, identifier=True)
    )

quote_policy: что quote

quote_policy определяет, какие parts оборачивать в quotes при rendering:

rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    quote_policy=Policy(database=True, schema=True, identifier=True),
)
print(rel)
# "analytics"."marts"."fct_orders"


rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    quote_policy=Policy(database=False, schema=False, identifier=False),
)
print(rel)
# analytics.marts.fct_orders   ← no quotes


rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    quote_policy=Policy(database=False, schema=False, identifier=True),
)
print(rel)
# analytics.marts."fct_orders"   ← только identifier quoted

Quoting policy и case-sensitivity

Это критическая тема. Каждый warehouse имеет свои правила quoting и case-sensitivity:

Quoting и case-sensitivity по warehouses

Adapter-specific defaults:

# dbt-postgres
class PostgresRelation(BaseRelation):
    quote_policy: Policy = field(default_factory=lambda: Policy(database=True, schema=True, identifier=True))
    # Quote everything для case preservation


# dbt-snowflake
class SnowflakeRelation(BaseRelation):
    quote_policy: Policy = field(default_factory=lambda: Policy(database=False, schema=False, identifier=False))
    # No quoting by default — let Snowflake auto-uppercase


# dbt-bigquery
class BigQueryRelation(BaseRelation):
    quote_character: str = '`'   # BigQuery uses backtick
    quote_policy: Policy = field(default_factory=lambda: Policy(database=True, schema=True, identifier=True))

Snowflake quoting trap:

Самая распространённая ошибка — quoting на Snowflake.

-- Создаём с quoted identifier
CREATE TABLE "MyTable" (...);

-- Теперь queries требуют quotes:
SELECT * FROM MyTable;     -- [X] ERROR (looks for MYTABLE, which doesn't exist)
SELECT * FROM "MyTable";   -- [OK] OK
SELECT * FROM mytable;     -- [X] ERROR

-- В adapter код:
{{ ref('my_table') }} -> "MyTable" (если quoted) или MYTABLE (если unquoted)

Если materialization создаёт quoted, но downstream queries unquoted — silent failure. Lookup для wrong-cased identifier.

Production rule для Snowflake:

В dbt_project.yml:

quoting:
  database: false
  schema: false
  identifier: false

Все unquoted -> Snowflake auto-uppercases. Consistent.

For dbt-snowflake это default:

class SnowflakeRelation(BaseRelation):
    quote_policy: Policy = field(default_factory=lambda: Policy(database=False, schema=False, identifier=False))

Relation.create() — main constructor

rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    type='table',
)

Arguments:

database, schema, identifier — обязательны для full path
type — 'table', 'view', 'cte', 'external', etc.
quote_policy — optional, defaults к adapter’s
include_policy — optional, defaults к adapter’s
metadata — optional dict с warehouse-specific metadata

Returns: immutable BaseRelation instance.

Used everywhere в adapter code:

# В list_relations_without_caching macro
{% for row in result %}
  {%- set relation = api.Relation.create(
      database=schema_relation.database,
      schema=schema_relation.schema,
      identifier=row.name,
      type=row.type,
  ) -%}
  {% do relations.append(relation) %}
{% endfor %}

api.Relation.create() в Jinja is equivalent of BaseRelation.create() в Python.

incorporate(): обновление полей

Relation immutable — нельзя rel.identifier = 'new_name'. Use .incorporate():

rel = BaseRelation.create(database='analytics', schema='marts', identifier='fct_orders', type='view')

# Incorporate (returns new instance)
new_rel = rel.incorporate(path={'identifier': 'fct_orders_v2'})
print(new_rel)  # analytics.marts.fct_orders_v2

# Change type
new_rel = rel.incorporate(type='table')
print(new_rel.type)  # table

# Combined
new_rel = rel.incorporate(
    path={'identifier': 'fct_orders_backup'},
    type='table',
)

В materialization:

{%- set target_relation = this.incorporate(type='view') -%}
{# this is current node's Relation, type defaulted to identifier's actual type #}
{# .incorporate(type='view') creates new Relation with type='view' #}

Comparison: equals и identity

Relations comparable:

rel_a = BaseRelation.create(database='db', schema='sch', identifier='t1', type='table')
rel_b = BaseRelation.create(database='db', schema='sch', identifier='t1', type='view')

# Equal by path (ignoring type)
print(rel_a == rel_b)  # False (different type)
print(rel_a.path == rel_b.path)  # True (same path)

# Hash для use в sets/dicts
print(hash(rel_a))  # 12345
print(hash(rel_b))  # different (different type)

Used в caching, deduplication.

Idempotent quoting

Idempotent quoting = одинаковый результат при повторных применениях.

rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    quote_policy=Policy(database=True, schema=True, identifier=True),
)

print(rel)
# "analytics"."marts"."fct_orders"

# Render again
print(rel)
# "analytics"."marts"."fct_orders"  ← same, not "\"analytics\""...

# Through f-string
sql = f'SELECT * FROM {rel}'
print(sql)
# SELECT * FROM "analytics"."marts"."fct_orders"

Relation’s __str__ is idempotent — won’t double-quote.

Why important:

В macros вы used Relation many times. If quoting wasn’t idempotent — each rendering double-quotes, breaking SQL.

-- Without idempotent — broken:
CREATE TABLE {{ relation }} AS ...
-- becomes: CREATE TABLE """db""."sch"."t1""" AS ...   (broken!)

-- With idempotent — works:
CREATE TABLE {{ relation }} AS ...
-- becomes: CREATE TABLE "db"."sch"."t1" AS ...

Implementation в BaseRelation:

def __str__(self):
    parts = []
    if self.include_policy.database:
        parts.append(self.quoted_if_required(self.database, 'database'))
    if self.include_policy.schema:
        parts.append(self.quoted_if_required(self.schema, 'schema'))
    if self.include_policy.identifier:
        parts.append(self.quoted_if_required(self.identifier, 'identifier'))
    return '.'.join(parts)

def quoted_if_required(self, name, part):
    if getattr(self.quote_policy, part):
        return f'{self.quote_character}{name}{self.quote_character}'
    return name

Adapter-specific Relation subclass

В каждом adapter — <Adapter>Relation. Override defaults:

# dbt-snowflake/dbt/adapters/snowflake/relation.py
@dataclass(frozen=True, eq=False, repr=False)
class SnowflakeRelation(BaseRelation):
    quote_character: str = '"'
    
    # Snowflake uses no quoting by default (auto-uppercase)
    quote_policy: Policy = field(default_factory=lambda: Policy(
        database=False, schema=False, identifier=False
    ))
    
    # All parts included
    include_policy: Policy = field(default_factory=lambda: Policy(
        database=True, schema=True, identifier=True
    ))
    
    # Snowflake-specific properties
    transient: Optional[bool] = None

transient — Snowflake-specific (transient tables don’t have time travel, cheaper).

dbt-bigquery:

@dataclass(frozen=True, eq=False, repr=False)
class BigQueryRelation(BaseRelation):
    quote_character: str = '`'   # backtick, not double-quote!
    
    quote_policy: Policy = field(default_factory=lambda: Policy(
        database=True, schema=True, identifier=True
    ))
    
    include_policy: Policy = field(default_factory=lambda: Policy(
        database=True, schema=True, identifier=True
    ))

Custom adapter — inherit и override:

# dbt-myadapter/dbt/adapters/myadapter/relation.py
from dbt.adapters.base import BaseRelation, Policy
from dataclasses import dataclass, field

@dataclass(frozen=True, eq=False, repr=False)
class MyAdapterRelation(BaseRelation):
    quote_character: str = '"'
    
    # Adjust based on your warehouse
    quote_policy: Policy = field(default_factory=lambda: Policy(
        database=True, schema=True, identifier=True
    ))

Best practices

Use api.Relation.create() в macros, not direct dict manipulation.
Document quoting policy в README — explain why your warehouse needs specific config.

Configure project-level quoting в dbt_project.yml для consistency:

quoting:
  database: false
  schema: false
  identifier: false

Test quoting cross-warehouse — same Jinja must produce valid SQL на all targets.
Avoid case sensitivity reliance — use lowercase or uppercase consistently.

Попробуй сам

В Python REPL:

from dbt.adapters.base import BaseRelation, Policy

rel = BaseRelation.create(
    database='analytics',
    schema='marts',
    identifier='fct_orders',
    type='table',
)
print(rel)
print(rel.path)
print(rel.type)

Try different quote policies:

rel_quoted = rel.incorporate()  # uses defaults

# Override quoting
from dataclasses import replace
rel_no_quote = replace(rel, quote_policy=Policy(database=False, schema=False, identifier=False))

print(rel_quoted)     # "analytics"."marts"."fct_orders"
print(rel_no_quote)   # analytics.marts.fct_orders

Compare relations:

rel_a = BaseRelation.create(database='db', schema='sch', identifier='t1', type='table')
rel_b = BaseRelation.create(database='db', schema='sch', identifier='t1', type='view')
print(rel_a == rel_b)  # False
print(rel_a.path == rel_b.path)  # True

В dbt project — test reference:

-- models/test_quoting.sql
SELECT * FROM {{ ref('upstream_model') }}

Run dbt compile. Check target/compiled/test_quoting.sql — see actual rendered Relation.

Ключевые выводы

Relation = (database, schema, identifier, type) — 3-part identifier + type.
include_policy — какие parts включать в rendered name. Policy(database=False, schema=True, identifier=True) — для warehouses без DB tier.
quote_policy — какие parts quoting. Snowflake default Policy(False, False, False) для auto-uppercasing.
quote_character — " (most), ` (BigQuery), [] (SQL Server).
Case-sensitivity варьируется: Postgres (lowercase unquoted), Snowflake (uppercase unquoted, locked-case quoted), DuckDB (case-insensitive), BigQuery (case-sensitive).
Snowflake quoting trap — once quoted, locked permanently. Use unquoted in adapter for consistency.
Idempotent quoting — __str__ doesn’t double-quote. Safe to render Relation many times.
api.Relation.create() в Jinja, BaseRelation.create() в Python — canonical constructor.

Проверка знанийKnowledge check

Senior пишет adapter для proprietary warehouse 'CaseStrict' что case-sensitive для всех identifiers. Как настроить Relation class?

ОтветAnswer

Для case-sensitive warehouse — ensure quoting preserves case correctly.\n\n**Configuration**:\n\n```python\n# dbt-casestrict/dbt/adapters/casestrict/relation.py\nfrom dbt.adapters.base import BaseRelation, Policy\nfrom dataclasses import dataclass, field\n\n@dataclass(frozen=True, eq=False, repr=False)\nclass CaseStrictRelation(BaseRelation):\n # Use double-quote (ANSI standard)\n quote_character: str = '"'\n \n # Quote everything to preserve case\n quote_policy: Policy = field(default_factory=lambda: Policy(\n database=True,\n schema=True,\n identifier=True,\n ))\n \n # Include all parts\n include_policy: Policy = field(default_factory=lambda: Policy(\n database=True,\n schema=True,\n identifier=True,\n ))\n```\n\n**Why quote everything**:\n\nIf identifiers case-sensitive AND user writes `my_table` (lowercase) в profiles.yml/SQL:\n\n- **Без quoting**: warehouse may interpret as uppercase, lowercase, или error. Inconsistent.\n- **С quoting** `"my_table"`: explicit case preservation, predictable.\n\n**Set в dbt_project.yml для projects using your adapter**:\n\n```yaml\nquoting:\n database: true\n schema: true\n identifier: true\n```\n\nThis overrides any user customization that might break case sensitivity.\n\n**Test case-sensitivity preservation**:\n\n```python\nrel = CaseStrictRelation.create(\n database='AnalyticsDB',\n schema='Marts',\n identifier='fct_orders',\n type='table',\n)\nprint(rel)\n# "AnalyticsDB"."Marts"."fct_orders" ← case preserved\n```\n\nIf user mixes case in profiles.yml:\n\n```yaml\nprod:\n database: AnalyticsDB # Capital A, D\n schema: marts # lowercase\n```\n\nQueries reference exact case:\n\n```sql\nSELECT * FROM "AnalyticsDB"."marts"."fct_orders"\n```\n\nMatch.\n\n**Without quoting**:\n\n```sql\nSELECT * FROM AnalyticsDB.marts.fct_orders\n```\n\nWarehouse interprets:\n- Could be "AnalyticsDB" or "analyticsdb" depending on rules\n- Inconsistent behavior\n- User confusion\n\n**Case-strict warehouse examples**:\n\n- **BigQuery**: case-sensitive для everything. Uses backtick.\n- **PostgreSQL**: unquoted -> lowercase. Quoted preserves case. Recommend quote_policy(True, True, True) для projects using mixed case.\n- **Some MPP databases**: varies.\n\n**Edge case — Quoted with special chars**:\n\nIf identifier имеет space или special chars:\n\n```sql\n"My Table" -- spaces OK with quotes\n"100 Years" -- numbers at start OK with quotes\n\"\" -- empty identifier (usually disallowed)\n```\n\nQuoting essential для special-char identifiers.\n\n**Production considerations**:\n\n1. **Default to quote everything** — safest для unknown user code\n2. **Document the policy** в README clearly\n3. **Test mixed-case scenarios**:\n - All lowercase\n - All uppercase\n - Mixed (CamelCase)\n - With underscores\n - With numbers\n\n4. **CI tests** verifying quoting:\n \n ```python\n def test_quoting_preserves_case():\n rel = CaseStrictRelation.create(\n database='MyDB',\n schema='MySchema', \n identifier='MyTable',\n type='table',\n )\n assert str(rel) == '"MyDB"."MySchema"."MyTable"'\n ```\n\n5. **Compare с другими adapters' implementations**:\n - dbt-postgres: default quoting policy True for all\n - dbt-snowflake: default False (auto-uppercase)\n - dbt-bigquery: True (case-sensitive)\n\n**Trade-off**:\n\n**Quoting everything** (your choice):\n- [OK] Predictable\n- [OK] Case preserved\n- [X] Slightly more verbose SQL\n- [X] Если user писал `my_table` (no quotes) — they expect lowercase identifiers, but quoting preserves whatever case в config\n\n**Unquoted с case rules**:\n- [OK] Cleaner SQL\n- [X] Confusing если user wants mixed case\n- [X] Hard-coded behavior\n\nFor case-strict warehouse — **quoting everything** is right choice.\n\nЭто **architectural decision** baked into adapter. Senior должен make it carefully based на warehouse semantics.

Проверка знанийKnowledge check

Что значит 'Snowflake quoting trap'? Как избежать?

ОтветAnswer

**Snowflake quoting trap** — известный gotcha (#8 senior gotchas).\n\n**Что происходит**:\n\nSnowflake поведение identifiers:\n\n- **Unquoted** identifier -> auto-uppercased. `my_table` becomes `MY_TABLE`.\n- **Quoted** identifier -> case preserved **permanently**.\n\n**Trap scenario**:\n\n```sql\n-- Step 1: Create with quoted identifier\nCREATE TABLE "my_table" (id INT);\n-- Snowflake stores literally: my_table (lowercase)\n\n-- Step 2: Query without quotes\nSELECT * FROM my_table;\n-- Snowflake looks for MY_TABLE (auto-uppercased) — NOT FOUND\n-- Error: 'object MY_TABLE does not exist'\n\n-- Step 3: Query с quotes\nSELECT * FROM "my_table";\n-- Works.\n```\n\n**Once you quote**, you must always quote.\n\n**dbt context — Trap происходит**:\n\nIf adapter configured for quoting, all relations quoted. Then ad-hoc queries (not from dbt) often don't quote -> fail.\n\n**Worse — Mixed scenario**:\n\nTwo dbt models, один с quoting, другой без. Cross-references break.\n\nExample:\n\n```yaml\n# Model A — quoted\n+quoting:\n identifier: true\n\n# Model B — unquoted\n+quoting:\n identifier: false\n```\n\nModel A creates `"users"` (lowercase). Model B queries `users` -> Snowflake looks for `USERS` (uppercase) -> не find.\n\n**Fix — Consistent unquoted (dbt-snowflake default)**:\n\n```yaml\n# dbt_project.yml\nquoting:\n database: false\n schema: false\n identifier: false\n```\n\nAll dbt-generated SQL unquoted. Snowflake auto-uppercases consistently.\n\n```sql\nCREATE TABLE users (id INT);\n-- Stored as USERS\n\nSELECT * FROM users;\n-- Looks for USERS — found\n\nSELECT * FROM USERS;\n-- Looks for USERS — found (same)\n\nSELECT * FROM Users;\n-- Looks for USERS — found\n```\n\nAll case variants работают, because all stored as uppercase.\n\n**dbt-snowflake's SnowflakeRelation**:\n\n```python\nclass SnowflakeRelation(BaseRelation):\n quote_policy: Policy = field(default_factory=lambda: Policy(\n database=False,\n schema=False,\n identifier=False,\n ))\n```\n\nDefault to no quoting -> avoid trap.\n\n**Edge case — Already-quoted relations**:\n\nIf legacy data has quoted relations (from prior tools), dbt struggles to access:\n\n```sql\n-- Existing table created by another tool:\nCREATE TABLE "MyCustomerTable" (...);\n\n-- dbt unquoted approach:\nSELECT * FROM mycustomertable;\n-- Snowflake looks for MYCUSTOMERTABLE -> not found\n-- (Actual name: "MyCustomerTable" mixed-case)\n```\n\n**Workaround**:\n\nUse `quoting.identifier: true` selectively for these relations:\n\n```yaml\nsources:\n - name: legacy\n +quoting:\n identifier: true\n tables:\n - name: 'MyCustomerTable' # exact case\n```\n\nThen ref correctly:\n\n```sql\nSELECT * FROM {{ source('legacy', 'MyCustomerTable') }}\n-- Renders: SELECT * FROM "MyCustomerTable"\n```\n\nMixed approach — most things unquoted, specific legacy relations quoted.\n\n**Best practice для new projects**:\n\n1. **All unquoted from day one** — no trap possible.\n2. **Standard convention**: all-lowercase identifiers в код.\n3. **Snowflake stores as uppercase** — internal detail, irrelevant.\n4. **Avoid mixed-case identifiers** — confusion.\n\n**For migration projects** with existing quoted relations:\n\n1. **Audit** — find all relations с unusual casing.\n2. **Decide**: rename to standard или keep quoted.\n3. **Document** в README.\n4. **CI gates** — check no new quoted relations introduced без reason.\n\n**Why это так fragile**:\n\nSnowflake's design pre-dates ANSI standards in some aspects. Case sensitivity rules optimized for performance (uppercase internal representation), but exposed as semantically meaningful to users.\n\nUnlike Postgres (always lowercase unquoted) или DuckDB (case-insensitive everywhere), Snowflake's behavior — middle ground that confuses.\n\n**Other warehouses без trap**:\n\n- **Postgres**: unquoted -> lowercase, quoted preserves. Predictable.\n- **BigQuery**: all case-sensitive (with backticks). Predictable.\n- **DuckDB**: case-insensitive everywhere. Predictable.\n- **MySQL**: depends OS, often case-insensitive. Less predictable.\n\nSnowflake's trap unique.\n\n**Senior knowledge**: знать adapter quirks. Snowflake quoting trap — must-know.\n\nЭто **production-grade discipline** — set quoting policy carefully на adapter level.

Relation API: Relation.create() и quoting policy

Что такое Relation

Структура BaseRelation

include_policy: что включать в name

quote_policy: что quote

Quoting policy и case-sensitivity

Relation.create() — main constructor

incorporate(): обновление полей

Comparison: equals и identity

Idempotent quoting

Adapter-specific Relation subclass

Best practices

Попробуй сам

Ключевые выводы

Закончили урок?