Node anatomy: unique_id, paths, refs, depends_on, config, compiled_sql

В предыдущем уроке мы посмотрели manifest.json с верхнего уровня. Теперь — вглубь. Node (model, test, snapshot, seed, analysis, operation) имеет десятки полей, каждое со специфической семантикой. Senior должен знать их наизусть, потому что любое tooling-усилие (observability, optimization, custom integrations) сводится к чтению этих полей.

Цель урока — выйти от node['refs'] к полному mental model: чем path отличается от original_file_path, почему unique_id имеет три сегмента, что хранится в depends_on vs refs, когда compiled_code равно null, что такое checksum и зачем.

Defining contract: columns, data_type, constraints (dbt II) Exposures: декларация downstream-консьюмеров (dbt I)

Identity полей

unique_id

"unique_id": "model.jaffle_shop.customers"

Формат: <resource_type>.<package>.<name> (для some — добавляется .<version> или другие сегменты).

Конкретные примеры:

model.jaffle_shop.customers — model customers в проекте jaffle_shop
model.jaffle_shop.fct_orders.v2 — versioned model
source.jaffle_shop.raw.orders — source orders в group raw в проекте jaffle_shop
seed.jaffle_shop.country_codes — seed
test.jaffle_shop.not_null_customers_id.abc123 — auto-generated test name
data_test.jaffle_shop.my_custom_test — generic test
snapshot.jaffle_shop.user_history — snapshot
analysis.jaffle_shop.revenue_overview — analysis
operation.jaffle_shop.create_external_schema — on-run hook
macro.dbt_utils.surrogate_key — macro из dbt_utils package
exposure.jaffle_shop.executive_dashboard — exposure
metric.jaffle_shop.revenue — metric (legacy semantic layer)
semantic_model.jaffle_shop.orders — semantic model (MetricFlow)
saved_query.jaffle_shop.weekly_revenue — MetricFlow saved query
group.jaffle_shop.finance — model group
unit_test.jaffle_shop.test_revenue_logic — unit test
doc.jaffle_shop.customer_id — doc block

TIP

unique_id — единственный идентификатор, который никогда не меняется между runs (если файл не переименован). Используйте его как primary key в observability databases, кеш-ключах, lineage records.

name

"name": "customers"

Простое имя файла (без .sql). Может конфликтовать между packages — поэтому unique_id includes package.

fqn (Fully Qualified Name)

"fqn": ["jaffle_shop", "marts", "customers"]

Массив, отражающий иерархию: [<package>, <folder1>, <folder2>, ..., <name>]. Это используется для selection syntax:

dbt run --select marts        # все models в фолдере marts/
dbt run --select marts.customers  # одну модель
dbt run --select +marts        # +parents

fqn также определяет config inheritance из dbt_project.yml:

models:
  jaffle_shop:
    marts:
      +materialized: table  # apply ко всем marts

Этот config matches fqn = ['jaffle_shop', 'marts', '*'].

resource_type

"resource_type": "model"

Один из: model, test, data_test, unit_test, snapshot, seed, analysis, operation, sql_operation, source, macro, exposure, metric, semantic_model, saved_query, group, doc.

package_name

"package_name": "jaffle_shop"

Из какого пакета (root project, dbt_utils, audit_helper, custom).

Path полей

dbt различает три path:

{
  "path": "marts/customers.sql",
  "original_file_path": "models/marts/customers.sql",
  "root_path": "/Users/alice/jaffle_shop"
}

original_file_path — relative path от project root (включает models/, tests/, macros/, и т.д.). Это что вы видите в file system.
path — relative path внутри resource type фолдера (без models/ префикса). Используется dbt для compile output.
root_path — абсолютный путь project root. Removed в новых versions (mesh-incompatible). Используйте original_file_path.

WARNING

В dbt 1.6+ root_path начал убираться (mesh projects могут жить в разных директориях). Production tools должны использовать original_file_path. Если совмещаете с file system, joinpath относительно known dbt project root.

Where compiled output goes

"compiled_path": "target/compiled/jaffle_shop/models/marts/customers.sql"

dbt записывает rendered SQL в target/compiled/<package>/<original_file_path> после dbt compile или dbt run. Это mirror structure repo внутри target/. Useful для debugging: сравнить compiled SQL с raw.

Также:

"build_path": "target/run/jaffle_shop/models/marts/customers.sql"

target/run/ содержит final DDL (CREATE TABLE AS, MERGE, etc.), который dbt отправил adapter. Это compiled SQL обёрнутый в materialization template.

Database/schema/alias

{
  "database": "jaffle_shop",
  "schema": "main",
  "alias": "customers"
}

database — куда писать (resolved через generate_database_name).
schema — куда писать (resolved через generate_schema_name).
alias — final table/view name (по умолчанию = name, но можно override через config: {alias: 'foo'}).

Final FQN в warehouse: {database}.{schema}.{alias}.

В dev: jaffle_shop_dev.dbt_alice_dev_marts.customers (если generate_schema_name добавляет user prefix).

Refs и Sources

refs

{
  "refs": [
    {"name": "stg_customers", "package": null, "version": null},
    {"name": "stg_orders", "package": "ext_jaffle_shop", "version": "v2"}
  ]
}

Список того, что модель референсит через {{ ref('...') }}. Каждый ref — dictionary:

name — имя target node
package — package, если cross-project (null = same project)
version — версия (для versioned models, добавлено в schema v9)

NOTE

До schema v9 refs были просто strings: ["stg_customers", "stg_orders"]. Versioned models потребовали переход на dictionaries. Tools должны handle оба format при поддержке нескольких dbt versions.

sources

{
  "sources": [
    ["raw", "orders"],
    ["raw", "customers"]
  ]
}

Список того, что модель референсит через source. Каждый source — список [source_name, table_name].

Чтобы получить full source unique_id:

for src in node['sources']:
    source_name, table_name = src
    package = node['package_name']
    source_uid = f"source.{package}.{source_name}.{table_name}"

metrics

{
  "metrics": [
    ["revenue"],
    ["arr"]
  ]
}

То же для {{ metric('name') }} calls.

depends_on — flattened upstream

{
  "depends_on": {
    "nodes": [
      "model.jaffle_shop.stg_customers",
      "model.jaffle_shop.stg_orders",
      "source.jaffle_shop.raw.region"
    ],
    "macros": [
      "macro.dbt.statement",
      "macro.dbt_utils.surrogate_key"
    ]
  }
}

depends_on — resolved unique_ids всех upstream (nodes + macros). Это то, что Linker вычислил, обходя refs/sources/metrics. Используется для:

Топологическая сортировка для run order
Selection: +model = walk parents через depends_on.nodes
Macro dispatch — depends_on.macros показывает, что использовала node

refs vs depends_on

config — материализация и метаданные

{
  "config": {
    "enabled": true,
    "materialized": "incremental",
    "unique_key": "order_id",
    "incremental_strategy": "merge",
    "merge_exclude_columns": ["created_at"],
    "on_schema_change": "append_new_columns",
    "schema": null,
    "database": null,
    "alias": null,
    "tags": ["nightly", "marts"],
    "meta": {
      "owner": "analytics",
      "pii": false,
      "cost_per_run": 2.50
    },
    "grants": {
      "select": ["analyst_role"]
    },
    "persist_docs": {
      "relation": true,
      "columns": true
    },
    "contract": {
      "enforced": false
    },
    "access": "protected",
    "group": null,
    "docs": {"show": true, "node_color": "#FF9900"},
    "pre_hook": [],
    "post_hook": [],
    "full_refresh": null,
    "snapshot_meta_column_names": {}
  }
}

Самый rich object. Содержит resolved config (after merging dbt_project.yml defaults, model-level config, YAML config, target overrides). Это post-processing representation.

Ключевые подразделы:

Materialization: materialized (table/view/incremental/ephemeral/materialized_view/snapshot/microbatch/custom).
Incremental: unique_key, incremental_strategy, on_schema_change, merge_exclude_columns, merge_update_columns, microbatch_strategy, event_time, lookback, begin, batch_size.
Identity overrides: schema, database, alias (если null — используются defaults).
Tagging: tags — array строк.
Metadata: meta — arbitrary dictionary для custom metadata (owners, PII flags, cost).
Permissions: grants — adapter-specific.
Documentation: persist_docs, docs.show, docs.node_color.
Governance: contract.enforced, access (public/private/protected), group.
Hooks: pre_hook, post_hook — arrays SQL statements.
Snapshots: strategy, updated_at, check_cols, target_database, target_schema, и т.д.
Custom: любые user-defined keys через +my_custom: ... в dbt_project.yml.

TIP

config — это resolved config для current target. Если у вас prod-specific config (+materialized: "{{ 'table' if target.name == 'prod' else 'view' }}"), то в manifest вы увидите 'table' для prod target, 'view' для dev. Не raw template.

columns — colum-level metadata

{
  "columns": {
    "order_id": {
      "name": "order_id",
      "description": "Order PK",
      "data_type": "BIGINT",
      "constraints": [
        {"type": "not_null"},
        {"type": "primary_key"}
      ],
      "quote": null,
      "tags": ["pk"],
      "meta": {"pii": false},
      "data_tests": []
    },
    "amount": {
      "name": "amount",
      "description": "Order total in USD",
      "data_type": "NUMERIC(10,2)",
      "constraints": [
        {"type": "check", "expression": "amount >= 0"}
      ],
      "quote": null,
      "tags": [],
      "meta": {}
    }
  }
}

Каждая column в schema.yml (или model.yml) попадает в manifest. Поля:

name — column name
description — для docs site, persist_docs.columns: true — also в warehouse
data_type — для model contracts (validates type matches warehouse)
constraints — array constraints (not_null, primary_key, foreign_key, unique, check)
quote — quote in queries (true/false/null = default)
tags — column-level tags (для grants/data classification)
meta — arbitrary custom metadata

Если column не в YAML — не будет в manifest. dbt-osmosis solves this by inferring через downstream model columns.

raw_code vs compiled_code

{
  "raw_code": "SELECT customer_id, COUNT(*) AS order_count FROM {{ ref('stg_orders') }} GROUP BY 1",
  "compiled_code": "SELECT customer_id, COUNT(*) AS order_count FROM \"jaffle_shop\".\"main\".\"stg_orders\" GROUP BY 1",
  "compiled": true
}

raw_code — оригинальный SQL/Python из файла. Always populated after parsing.
compiled_code — rendered Jinja. Null после parsing, заполняется после dbt compile/dbt run.
compiled — boolean flag.

Для Python models:

{
  "raw_code": "def model(dbt, session): ...",
  "language": "python"
}

language поле differentiate SQL vs Python.

WARNING

Если ваш tool читает manifest до compile phase (например, для dbt parse-only workflows), не полагайтесь на compiled_code. Используйте raw_code или explicit re-compile через dbt Python API.

checksum

{
  "checksum": {
    "name": "sha256",
    "checksum": "8f2a1c4e7b9d3f6a..."
  }
}

SHA256 hash содержимого файла (raw_code + некоторые dependencies). Используется для:

Partial parsing: dbt сравнивает checksum в partial_parse.msgpack с current file hash. Если совпадает — не парсит заново.
state:modified: dbt run --select state:modified+ --state previous_run/ — отлично через checksum определяет, изменилась ли модель.
Slim CI: тот же mechanism для PR-vs-main diff.

NOTE

checksum покрывает только содержимое модели, не её зависимости. Если изменился macro, который использует модель, checksum модели не изменится. dbt отслеживает это через state:modified.macros отдельно.

meta, tags, group, docs

group

{"group": "finance"}

Membership в model group (для access control). Public/private/protected models gated через group ownership.

docs

{"docs": {"show": true, "node_color": "#FF9900"}}

show — показывать ли в docs site (false скрывает legacy nodes)
node_color — цвет в lineage graph

Tests-specific поля

Тесты (resource_type=‘test’ или ‘data_test’) имеют дополнительные поля:

{
  "unique_id": "test.jaffle_shop.not_null_customers_id.abc12345",
  "name": "not_null_customers_id",
  "resource_type": "test",
  "test_metadata": {
    "name": "not_null",
    "namespace": null,
    "kwargs": {
      "column_name": "id",
      "model": "{{ get_where_subquery(ref('customers')) }}"
    }
  },
  "column_name": "id",
  "attached_node": "model.jaffle_shop.customers",
  "config": {
    "severity": "error",
    "warn_if": "!= 0",
    "error_if": "!= 0",
    "store_failures": false,
    "limit": null,
    "where": null
  }
}

Ключевые тестовые поля:

test_metadata.name — имя generic test (not_null, unique, accepted_values, relationships, etc.)
test_metadata.kwargs — параметры test
attached_node — к какой модели/source attached (для column-level tests)
config.severity — error или warn
config.store_failures — сохранять failed rows в warehouse
config.limit — limit failed rows

Это позволяет tools (Elementary, dbt-checkpoint) понять, какие тесты к каким моделям относятся.

Snapshot-specific поля

{
  "unique_id": "snapshot.jaffle_shop.user_history",
  "resource_type": "snapshot",
  "config": {
    "strategy": "check",
    "unique_key": "user_id",
    "check_cols": ["email", "subscription_tier"],
    "updated_at": null,
    "target_database": "jaffle_shop",
    "target_schema": "snapshots",
    "snapshot_meta_column_names": {
      "dbt_valid_from": "valid_from",
      "dbt_valid_to": "valid_to",
      "dbt_scd_id": "scd_id",
      "dbt_updated_at": "updated_at"
    },
    "hard_deletes": "ignore"
  }
}

Snapshots имеют свой config block — strategy, check_cols (для check strategy), updated_at (для timestamp strategy), и snapshot meta columns.

Seed-specific поля

{
  "unique_id": "seed.jaffle_shop.country_codes",
  "resource_type": "seed",
  "path": "seeds/country_codes.csv",
  "config": {
    "materialized": "seed",
    "delimiter": ",",
    "quote_columns": null,
    "column_types": {
      "code": "VARCHAR(3)",
      "country": "VARCHAR(100)"
    }
  },
  "root_path": "...",
  "package_name": "jaffle_shop"
}

Note: compiled_code для seed обычно null (CSV processing идёт separately).

Реальный пример — incremental model

{
  "model.jaffle_shop.fct_orders": {
    "database": "jaffle_shop",
    "schema": "marts",
    "name": "fct_orders",
    "resource_type": "model",
    "package_name": "jaffle_shop",
    "path": "marts/fct_orders.sql",
    "original_file_path": "models/marts/fct_orders.sql",
    "unique_id": "model.jaffle_shop.fct_orders",
    "fqn": ["jaffle_shop", "marts", "fct_orders"],
    "alias": "fct_orders",
    "checksum": {"name": "sha256", "checksum": "9a8f..."},
    "language": "sql",
    "config": {
      "enabled": true,
      "materialized": "incremental",
      "unique_key": "order_id",
      "incremental_strategy": "merge",
      "on_schema_change": "append_new_columns",
      "tags": ["finance", "nightly"],
      "meta": {"owner": "data-team", "cost_attribution": "finance"},
      "grants": {"select": ["analyst_role"]},
      "persist_docs": {"relation": true, "columns": true},
      "contract": {"enforced": true},
      "access": "public",
      "group": "finance"
    },
    "tags": ["finance", "nightly"],
    "description": "Order facts — one row per order",
    "columns": {
      "order_id": {
        "name": "order_id",
        "description": "Order PK",
        "data_type": "BIGINT",
        "constraints": [{"type": "not_null"}, {"type": "primary_key"}]
      },
      "customer_id": {
        "name": "customer_id",
        "description": "FK to dim_customers",
        "data_type": "BIGINT",
        "constraints": [{"type": "not_null"}]
      },
      "amount_usd": {
        "name": "amount_usd",
        "description": "Order total in USD",
        "data_type": "NUMERIC(10,2)",
        "constraints": [{"type": "check", "expression": "amount_usd >= 0"}]
      }
    },
    "meta": {"owner": "data-team"},
    "group": "finance",
    "refs": [
      {"name": "stg_orders", "package": null, "version": null},
      {"name": "stg_payments", "package": null, "version": null}
    ],
    "sources": [["raw", "exchange_rates"]],
    "metrics": [],
    "depends_on": {
      "macros": [
        "macro.dbt.statement",
        "macro.dbt.is_incremental",
        "macro.jaffle_shop.convert_currency"
      ],
      "nodes": [
        "model.jaffle_shop.stg_orders",
        "model.jaffle_shop.stg_payments",
        "source.jaffle_shop.raw.exchange_rates"
      ]
    },
    "compiled_path": "target/compiled/jaffle_shop/models/marts/fct_orders.sql",
    "build_path": "target/run/jaffle_shop/models/marts/fct_orders.sql",
    "compiled": true,
    "compiled_code": "MERGE INTO ... USING (...) ON ... WHEN MATCHED THEN UPDATE SET ... WHEN NOT MATCHED THEN INSERT ...",
    "raw_code": "SELECT o.order_id, ... FROM {{ ref('stg_orders') }} o ...",
    "extra_ctes_injected": true,
    "extra_ctes": []
  }
}

Это полный node после dbt run для production. Видно:

Incremental с merge strategy
Contract enforced (data_types + constraints)
Public access, в group finance
Используется метаданные: owner, cost_attribution, persist_docs, grants
depends_on resolved через Linker

Python integration — pydantic models

dbt-core внутренне использует pydantic models для validation. Можно использовать их в tools:

from dbt.contracts.graph.nodes import ModelNode
from dbt.contracts.graph.manifest import Manifest
import json

manifest_data = json.load(open("target/manifest.json"))
manifest = Manifest.from_dict(manifest_data)

for unique_id, node in manifest.nodes.items():
    if isinstance(node, ModelNode):
        print(f"{node.unique_id}: {node.config.materialized}")
        print(f"  Refs: {[r.name for r in node.refs]}")
        print(f"  Depends on: {node.depends_on.nodes}")
        print(f"  Tags: {node.tags}")
        if node.config.materialized == "incremental":
            print(f"  Strategy: {node.config.incremental_strategy}")
            print(f"  Unique key: {node.config.unique_key}")

Преимущество — type safety. Недостаток — dbt-core внутренний API меняется между versions. Production-tools используют либо raw JSON либо stable Python API.

NOTE

В уроке 04-parsing-manifest-python.mdx мы подробнее разберём, как извлекать info из manifest, включая edge cases (None vs missing fields, schema migration, performance).

Антипаттерны при работе с nodes

1. Assumption refs are strings

# ПЛОХО: работает только для старых dbt versions
for ref in node['refs']:
    upstream_name = ref

С schema v9 refs — dicts. Правильно:

for ref in node['refs']:
    if isinstance(ref, dict):
        upstream_name = ref['name']
    elif isinstance(ref, str):
        upstream_name = ref  # legacy
    elif isinstance(ref, list):
        upstream_name = ref[0]  # very old

2. Joining refs с current package

# ПЛОХО: ломается для cross-project refs
upstream_uid = f"model.{node['package_name']}.{ref['name']}"

Правильно — использовать ref['package']:

target_package = ref.get('package') or node['package_name']
upstream_uid = f"model.{target_package}.{ref['name']}"

3. Игнорирование sources в depends_on

# ПЛОХО: депенденси analytics tool пропускает source dependencies
parents = node['depends_on']['nodes']
sources = [p for p in parents if p.startswith('source.')]
# sources уже включены в depends_on.nodes!

depends_on.nodes включает models, sources, snapshots, seeds — всё. Не нужно их искать отдельно.

4. Парсинг compiled_code при parsing-only workflow

# ПЛОХО: пытаемся извлечь SELECT columns из compiled_code, но он null
import sqlparse
parsed = sqlparse.parse(node['compiled_code'])  # error: NoneType

Если manifest из dbt parse (без compile), compiled_code = null. Проверяйте:

if node.get('compiled_code'):
    parsed = sqlparse.parse(node['compiled_code'])
else:
    # Fallback к raw_code или skip
    pass

5. Hardcoded paths

# ПЛОХО: ломается на mesh / different OSes
abs_path = node['root_path'] + '/' + node['original_file_path']

Используйте pathlib:

from pathlib import Path
project_root = Path(os.environ['DBT_PROJECT_DIR'])
abs_path = project_root / node['original_file_path']

6. Использование name вместо unique_id

# ПЛОХО: name может совпадать в разных packages
nodes_by_name = {n['name']: n for n in manifest['nodes'].values()}
# overwrites при коллизии

Используйте unique_id как key.

Performance hints

1. Lazy loading

Огромный manifest — не загружайте всё:

import ijson

def find_incremental_models(manifest_path):
    with open(manifest_path, 'rb') as f:
        for prefix, event, value in ijson.parse(f):
            if (prefix.endswith('.config.materialized')
                and event == 'string'
                and value == 'incremental'):
                yield prefix.split('.')[1]  # extract unique_id

2. Index по resource_type

def index_manifest(manifest):
    by_type = defaultdict(dict)
    for uid, node in manifest['nodes'].items():
        by_type[node['resource_type']][uid] = node
    return by_type

idx = index_manifest(manifest)
models = idx['model']  # O(1) lookup
tests = idx['test']

3. Pre-build child_map для downstream

Если parent_map уже есть, child_map тоже. Если нужно walk further:

def downstream(unique_id, child_map, max_depth=None):
    visited = set()
    queue = [(unique_id, 0)]
    while queue:
        node, depth = queue.pop(0)
        if node in visited:
            continue
        visited.add(node)
        if max_depth and depth >= max_depth:
            continue
        for child in child_map.get(node, []):
            queue.append((child, depth + 1))
    return visited - {unique_id}

Ключевые выводы

unique_id формат — <resource_type>.<package>.<name> (плюс version, hash для tests). Единственный stable identifier.
fqn — иерархия в form массива. Используется для selection и config inheritance.
Three paths: original_file_path (relative к project), path (relative к resource folder), root_path (absolute, deprecated в mesh).
refs: до v9 — strings, v9+ — dicts с name/package/version.
depends_on — resolved upstream unique_ids (nodes + macros). Linker computes.
config — post-merge resolved config (после dbt_project.yml + model + YAML + target overrides).
columns — column-level metadata: name, description, data_type, constraints, tests.
raw_code всегда есть после parsing; compiled_code только после compile/run.
checksum — SHA256 raw content. Используется для partial parsing и state:modified.
tests имеют test_metadata + attached_node поля; snapshots — strategy + check_cols/updated_at; seeds — delimiter + column_types.
Python integration через Manifest.from_dict() — type safety, но fragile API.
Performance: streaming для больших manifest, index по resource_type, careful с paths.

Проверка знанийKnowledge check

Аналитик хочет найти все downstream models у конкретной source ('raw.orders') до 2 hops. Опиши algorithm, какие top-level keys и node поля использовать, edge cases.

ОтветAnswer

**Цель**: `source.jaffle_shop.raw.orders` -> найти все downstream models в 2 hops.\n\n**Алгоритм с child_map** (предпочтительный — O(1) lookup):\n\n```python\nimport json\nfrom collections import deque\n\nmanifest = json.load(open('target/manifest.json'))\n\ndef downstream(start_uid, manifest, max_hops=2):\n child_map = manifest['child_map']\n visited = set()\n queue = deque([(start_uid, 0)])\n result = []\n \n while queue:\n node_uid, hops = queue.popleft()\n if node_uid in visited:\n continue\n visited.add(node_uid)\n \n # Add to result (skip start node itself)\n if node_uid != start_uid:\n result.append({'uid': node_uid, 'hops': hops})\n \n # Walk children if within max_hops\n if hops < max_hops:\n for child_uid in child_map.get(node_uid, []):\n if child_uid not in visited:\n queue.append((child_uid, hops + 1))\n \n return result\n\nstart = 'source.jaffle_shop.raw.orders'\nresults = downstream(start, manifest, max_hops=2)\n\n# Filter только models\nmodels_only = [\n r for r in results\n if r['uid'].startswith('model.')\n]\n\nfor r in models_only:\n node = manifest['nodes'][r['uid']]\n print(f"{r['hops']} hop: {node['name']} ({node['config']['materialized']})")\n```\n\n**Edge cases**:\n\n**1. Disabled models**: `child_map` не включает disabled. Если нужно audit — добавить:\n\n```python\nfor uid, nodes in manifest.get('disabled', {}).items():\n if not uid.startswith('model.'):\n continue\n node = nodes[0]\n refs = [r['name'] for r in node.get('refs', [])]\n sources = node.get('sources', [])\n \n # Check if disabled model would reference start node\n if start_uid in {f"source.{node['package_name']}.{s[0]}.{s[1]}" for s in sources}:\n # 1-hop downstream от disabled angle\n results.append({'uid': uid, 'hops': 1, 'disabled': True})\n```\n\n**2. Tests as children**: child_map включает tests. Они тоже downstream:\n\n```python\nfor r in results:\n if r['uid'].startswith('test.'):\n # Test attached to node\n test = manifest['nodes'][r['uid']]\n attached = test.get('attached_node')\n print(f"Test {test['name']} attached to {attached}")\n```\n\n**3. Exposures как final consumers**: exposures depending on downstream models.\n\n```python\nfor exp_uid, exp in manifest['exposures'].items():\n deps = exp['depends_on']['nodes']\n intersect = set(deps) & {r['uid'] for r in results}\n if intersect:\n print(f"Exposure {exp['name']} depends on downstream of {start}")\n```\n\n**4. Metrics / semantic_models**:\n\nMetricFlow nodes тоже могут depend on models. Если нужны:\n\n```python\nfor sm_uid, sm in manifest.get('semantic_models', {}).items():\n node_relation = sm.get('node_relation', {})\n # node_relation refers to model — check if matches downstream\n```\n\n**5. Cross-project mesh**: child_map в parent manifest не показывает children из dependent projects. Need to load multiple manifests:\n\n```python\ndef downstream_mesh(start_uid, root_manifest, dependent_manifests, max_hops=2):\n # Walk root, then check dependents для child references\n # depends_on в dependent manifest может contain start_uid\n ...\n```\n\n**6. Versioned models**: `model.jaffle_shop.fct_orders.v2` — separate node от `model.jaffle_shop.fct_orders.v1`. child_map keeps separate entries.\n\n**7. Ephemeral models**: они в child_map как любые другие — но note: ephemeral модели inlined в downstream SQL, не materialized.\n\n**Альтернатива через depends_on** (slower, O(N) traversal):\n\n```python\ndef downstream_via_depends(start_uid, manifest, max_hops=2):\n # Walk all nodes, check depends_on\n direct = [\n uid for uid, n in manifest['nodes'].items()\n if start_uid in n['depends_on']['nodes']\n ]\n # Recurse для 2nd hop\n if max_hops не меньше 2:\n second = []\n for d in direct:\n second.extend([\n uid for uid, n in manifest['nodes'].items()\n if d in n['depends_on']['nodes']\n ])\n return direct + second\n return direct\n```\n\nMедленнее (O(N×depth)), но не требует precomputed child_map. Useful если работаете с reduced manifest.\n\n**Production usage**:\n\n```python\n# Impact analysis: что упадёт если raw.orders недоступен?\nimpacted = downstream('source.jaffle_shop.raw.orders', manifest, max_hops=10)\nimpacted_critical = [\n r for r in impacted\n if manifest['nodes'].get(r['uid'], {}).get('config', {}).get('access') == 'public'\n]\nprint(f"Public models affected: {len(impacted_critical)}")\n```\n\n**Топ-level keys**:\n- `child_map` — primary, O(1) lookup\n- `nodes` — для filtering и metadata\n- `disabled` — для completeness\n- `exposures` — для consumer impact\n- `semantic_models` — для metric impact\n\n**Поля node**:\n- `unique_id` (key)\n- `resource_type` (filter)\n- `config.materialized`\n- `config.access`\n- `name`, `schema`, `database` (display)\n- `attached_node` (for tests)\n- `depends_on.nodes` (alternative path)\n\n**Output**:\n\n```\n1 hop: stg_orders (view)\n1 hop: not_null_stg_orders_id (test)\n2 hops: fct_orders (incremental)\n2 hops: fct_revenue (table)\n2 hops: relationships_fct_orders_customer (test)\nExposure 'weekly_revenue_dashboard' depends on downstream\n```\n\nThis powers impact analysis, change management, SLA monitoring.

Проверка знанийKnowledge check

Production tool читает manifest. Конкретный node имеет refs=[{name: 'fct_orders', package: null, version: 'v1'}]. depends_on.nodes содержит 'model.jaffle_shop.fct_orders.v2'. Версии не совпадают — это bug? Объясни механизм versioned refs.

ОтветAnswer

**Не bug — это `latest_version` resolution**.\n\n**Mechanism**:\n\nЕсли модель `fct_orders` versioned (имеет v1, v2, v3), пользователь может ref-ить:\n\n```sql\n{{ ref('fct_orders') }} -- unpinned: resolves к latest_version\n{{ ref('fct_orders', v=1) }} -- pinned к v1\n{{ ref('fct_orders', v=2) }} -- pinned к v2\n```\n\nВ `fct_orders.yml`:\n\n```yaml\nversions:\n - v: 1\n config:\n alias: fct_orders_v1\n - v: 2\n config:\n alias: fct_orders_v2\n defined_in: fct_orders_v2 # different SQL file\n - v: 3\n config:\n alias: fct_orders_v3\nlatest_version: 2 # "unpinned ref resolves to v2"\n```\n\n**При parsing**:\n\n1. Parser видит `{{ ref('fct_orders') }}` (unpinned).\n2. Создаёт ref entry: `{name: 'fct_orders', package: null, version: null}`.\n3. Resolver lookup-ит — нашёл versioned model.\n4. Резолвит к `latest_version: 2` — это `model.jaffle_shop.fct_orders.v2`.\n5. depends_on.nodes получает `'model.jaffle_shop.fct_orders.v2'`.\n\nrefs остаётся **as-written** (unpinned), depends_on отражает **resolved** target.\n\n**Если bug в данных**:\n\n- refs: `{name: 'fct_orders', version: 'v1'}` (PINNED к v1)\n- depends_on: `'model.jaffle_shop.fct_orders.v2'`\n\nЭто **inconsistency** — pinned ref expected to resolve к v1, но resolved к v2. Возможные причины:\n\n1. **latest_version migration**: пользователь забыл обновить ref после deprecation v1. dbt бросает warning, но всё ещё резолвит. С 1.7+ можно настроить strict pin.\n\n2. **Stale partial parse**: manifest содержит outdated state. Решение — `dbt clean && dbt parse`.\n\n3. **Custom override**: пользователь patched `fct_orders.yml`, изменив version definitions, но manifest не regenerated.\n\n4. **dbt bug** (rare): какой-то edge case в resolver.\n\n**Reality check** (твой scenario):\n\nrefs=`{version: 'v1'}` — pinned. depends_on=v2 — это либо:\n\na) Сценарий А: user поменял `v=1` на `v=2` в SQL, но это в новой версии manifest. Старый manifest content. -> `dbt parse` повторно.\n\nb) Сценарий B: tool читает manifest неправильно — extracts version from depends_on вместо refs. Tool bug.\n\nc) Сценарий C: refs стоит `v=null` (unpinned), tool ошибочно interpret как 'v1'. Tool bug.\n\n**Production check**:\n\n```python\ndef validate_versioned_refs(manifest):\n errors = []\n for uid, node in manifest['nodes'].items():\n if node['resource_type'] != 'model':\n continue\n \n depends_on_uids = node['depends_on']['nodes']\n \n for ref in node.get('refs', []):\n target_name = ref['name']\n target_version = ref.get('version')\n target_package = ref.get('package') or node['package_name']\n \n # Find resolved version в depends_on\n candidates = [\n d for d in depends_on_uids\n if d.startswith(f"model.{target_package}.{target_name}")\n ]\n \n if not candidates:\n errors.append(f"{uid}: ref {target_name} not in depends_on")\n continue\n \n # Pinned должен match exact version\n if target_version:\n expected = f"model.{target_package}.{target_name}.v{target_version}"\n if expected not in depends_on_uids:\n errors.append(\n f"{uid}: pinned ref({target_name}, v={target_version}) "\n f"but depends_on has {candidates}"\n )\n return errors\n```\n\n**Mesh-specific** — cross-project versioned refs:\n\n```json\n"refs": [\n {"name": "fct_orders", "package": "finance_project", "version": 2}\n]\n```\n\ndepends_on: `'model.finance_project.fct_orders.v2'` (cross-project unique_id).\n\n**Если `finance_project` releases v3 deprecating v2**, потребляющий проект всё ещё refs v2 — dbt warns но executes. Migration policy team-specific.\n\n**TL;DR**:\n\n- refs (raw) vs depends_on (resolved) — нормальная архитектура.\n- Unpinned ref резолвится к latest_version.\n- Pinned ref резолвится к specific version.\n- Inconsistency между refs.version и depends_on — bug либо stale manifest, либо tool misreading.\n\n**Production tools** должны:\n1. Use depends_on для actual dependencies (что dbt actually compiled with).\n2. Use refs для intent ('what user wrote').\n3. Validate consistency через explicit check.\n4. Handle unpinned refs (version=null) gracefully.

Node anatomy: unique_id, paths, refs, depends_on, config, compiled_sql

Identity полей

unique_id

name

fqn (Fully Qualified Name)

resource_type

package_name

Path полей

Where compiled output goes

Database/schema/alias

Refs и Sources

refs

sources

metrics

depends_on — flattened upstream

config — материализация и метаданные

columns — colum-level metadata

raw_code vs compiled_code

checksum

meta, tags, group, docs

meta

tags

group

docs

Tests-specific поля

Snapshot-specific поля

Seed-specific поля

Реальный пример — incremental model

Python integration — pydantic models

Антипаттерны при работе с nodes

1. Assumption refs are strings

2. Joining refs с current package

3. Игнорирование sources в depends_on

4. Парсинг compiled_code при parsing-only workflow

5. Hardcoded paths

6. Использование name вместо unique_id

Performance hints

1. Lazy loading

2. Index по resource_type

3. Pre-build child_map для downstream

Ключевые выводы

Закончили урок?