SL API и MCP server: JDBC, GraphQL, AI agents
В предыдущих уроках мы разобрались с MetricFlow internals — как metric queries компилируются в SQL. Теперь — как получить метрики из вне dbt CLI. Это API surface: JDBC, GraphQL, Python SDK, и новинка 2025-2026 — dbt MCP server для AI agents.
Этот урок:
- Architecture dbt Semantic Layer API.
- JDBC, GraphQL, Python SDK — для каких сценариев.
- Apache 2.0 open source с 2025 — что это значит для self-hosting.
- dbt MCP server — Model Context Protocol toolkit для AI agents (Claude, GPT-4, etc.).
- Production patterns для каждого API.
Semantic Layer API и BI integrations (dbt II)
Architecture overview
┌─────────────────────────────┐
│ dbt Cloud SL OR self-hosted │
│ semantic_manifest.json │
│ MetricFlow engine │
└─────────────────────────────┘
│
┌───────────────┼────────────────┬────────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌─────────────┐ ┌────────────┐
│ JDBC │ │ GraphQL │ │ Python SDK │ │ MCP Server │
│ driver │ │ API │ │ client │ │ (2025+) │
└────────┘ └──────────┘ └─────────────┘ └────────────┘
│ │ │ │
▼ ▼ ▼ ▼
BI tools Web dashboards Data scientists AI agents
(Tableau, (Hex, custom (Jupyter, (Claude,
Power BI, web apps) Pandas) GPT-4 via MCP)
Looker)
Один Source of Truth — semantic_manifest.json + MetricFlow engine. Множество API на top.
JDBC API
Пример connection string:
jdbc:dbt-sl://semantic-layer.cloud.getdbt.com:443
?account=my-account
&project_id=12345
&environment_id=67890
&api_token=SL_TOKEN
Или для self-hosted:
jdbc:dbt-sl://sl.mycompany.com:8080
?manifest_path=/var/dbt/target/semantic_manifest.json
Tableau / Power BI просто подключаются через standard JDBC и видят:
- Tables = semantic_models.
- Columns = dimensions.
- Measures = computed columns.
Queries через SQL — но MetricFlow интерпретирует и transforms в правильный warehouse SQL.
GraphQL API
GraphQL — более flexible для custom web apps и modern SDK integrations.
# Query metrics через GraphQL
query GetRevenue($groupBy: [String!]!, $where: [String!]) {
metrics(metrics: ["revenue"]) {
name
description
type
query(
groupBy: $groupBy,
whereFilter: $where,
orderBy: ["-revenue"],
limit: 100
) {
data {
revenue
country
order_month
}
sql # compiled SQL для debugging
compileTime # сколько потратил MetricFlow
executeTime # сколько потратил warehouse
}
}
}
Variables:
{
"groupBy": ["customers__country", "metric_time__month"],
"where": ["order_date >= '2025-01-01'"]
}
Response:
{
"data": {
"metrics": [{
"name": "revenue",
"query": {
"data": [
{"revenue": 1500000, "country": "USA", "order_month": "2025-01-01"},
{"revenue": 800000, "country": "UK", "order_month": "2025-01-01"},
...
],
"sql": "WITH ... SELECT country, SUM(amount) AS revenue FROM ...",
"compileTime": "0.5s",
"executeTime": "1.2s"
}
}]
}
}
GraphQL преимущества:
- Strongly typed schema — IDE auto-completion.
- Single endpoint — GET/POST одинаковый URL.
- Subqueries — get metric + dimensions + compiled SQL в одном request.
- WebSocket subscriptions — real-time updates (experimental).
Use cases:
- Custom web dashboards (Hex, Streamlit, custom React).
- Data discovery UIs.
- Internal admin panels.
Python SDK
Для data scientists и Python-based applications:
from dbt_sl import SemanticLayerClient
# Инициализация
client = SemanticLayerClient(
host="semantic-layer.cloud.getdbt.com",
api_token=os.environ["DBT_SL_TOKEN"],
project_id="12345",
environment_id="67890",
)
# Query метрику
result = client.query(
metrics=["revenue", "order_count"],
group_by=[
"customers__country",
"metric_time__month"
],
where=["order_date >= '2025-01-01'"],
order_by=["-revenue"],
limit=100,
)
# Возвращает pandas DataFrame
df = result.to_pandas()
print(df.head())
# revenue order_count country order_month
# 0 1500000 10000 USA 2025-01-01
# 1 800000 5000 UK 2025-01-01
# ...
# Or as Arrow table (zero-copy для downstream tools)
arrow_table = result.to_arrow()
# Доступ к compiled SQL для debugging
print(result.compiled_sql)
Python SDK builds на top of GraphQL API. Это просто convenient Python wrapper.
Use cases:
- Jupyter notebooks (data scientists).
- ETL pipelines, который consumes metrics.
- Custom Python applications.
Apache 2.0 lib, можно contribute (github.com/dbt-labs/dbt-sl-python).
Apache 2.0 open source (2025+)
Это strategic change dbt Labs:
Pre-2025: dbt Cloud SL = commercial. JDBC, GraphQL, Python SDK — closed source.
2025+: dbt Cloud SL продолжает commercial, но всё что **на top of semantic_manifest.json** — Apache 2.0.
Что Apache 2.0:
[x] MetricFlow engine (github.com/dbt-labs/metricflow)
[x] JDBC driver (github.com/dbt-labs/dbt-sl-jdbc)
[x] GraphQL server (github.com/dbt-labs/dbt-sl-graphql)
[x] Python SDK (github.com/dbt-labs/dbt-sl-python)
[x] MCP server (github.com/dbt-labs/dbt-mcp)
Что НЕ open source:
[ ] dbt Cloud SL hosting infrastructure (managed service)
[ ] dbt Cloud SL access controls, billing, monitoring
Strategic implication: self-hosted SL viable. Setup:
# 1. Run dbt parse -> generates semantic_manifest.json
dbt parse
# 2. Run open source SL services
docker run -d -p 8080:8080 \
-v $(pwd)/target:/target \
dbtlabs/dbt-sl-server:latest
# 3. Connect через JDBC/GraphQL/Python к localhost:8080
Pros self-hosted:
- No SaaS dependency.
- Data sovereignty.
- Cost control.
- Customization.
Cons:
- Need to operate (monitoring, scaling, updates).
- No managed support.
- Authentication / authorization — нужно построить.
Choice — typical buy vs build tradeoff. Apache 2.0 даёт opportunity for self-hosting, не obligation.
dbt MCP server (новинка 2025-2026)
Model Context Protocol (MCP) — standard от Anthropic для AI agents общаться с tools. dbt MCP server делает dbt SL доступным для AI agents (Claude, GPT-4, etc.).
Что MCP делает
AI Agent (Claude) dbt MCP Server dbt SL Engine
│ │ │
│ "show me revenue by │ │
│ country" │ │
├─────────────────────────►│ │
│ │ │
│ │ metric_query() │
│ ├──────────────────────►│
│ │ │
│ │ results │
│ │◄──────────────────────┤
│ │ │
│ formatted response │ │
│◄─────────────────────────┤ │
│ │ │
AI agent делает natural language query. MCP server разбирает intent, делает SL query, returns structured results. AI agent formats для user.
MCP toolsets
dbt MCP server exposes 8 toolsets:
1. list_metrics # узнать какие метрики доступны
2. describe_metric # описание метрики (definition, dimensions)
3. query_metric # выполнить query
4. list_dimensions # доступные dimensions для group_by
5. describe_dimension # описание dimension
6. list_semantic_models # все semantic_models в проекте
7. describe_semantic_model
8. list_saved_queries # pre-computed saved queries
Каждый toolset — это MCP tool definition с JSON schema. AI agent видит:
{
"name": "query_metric",
"description": "Run a metric query на dbt SL",
"parameters": {
"metrics": {"type": "array", "items": {"type": "string"}, "required": true},
"group_by": {"type": "array", "items": {"type": "string"}},
"where": {"type": "array", "items": {"type": "string"}},
"order_by": {"type": "array", "items": {"type": "string"}},
"limit": {"type": "integer"}
}
}
AI agent generates правильный JSON для tool call, MCP server executes, returns structured response.
Production setup
# Install
pip install dbt-mcp
# Start
dbt-mcp serve --port 5000 --target-path ./target
# Configure AI client (e.g., Claude Desktop)
# Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"dbt-sl": {
"command": "dbt-mcp",
"args": ["serve", "--target-path", "/path/to/dbt/target"]
}
}
}
Claude Desktop теперь видит dbt SL как доступный tool. Можно спросить:
“Show me total revenue by country for 2025.”
Claude generates tool call:
{
"name": "query_metric",
"parameters": {
"metrics": ["revenue"],
"group_by": ["customers__country"],
"where": ["order_date >= '2025-01-01'", "order_date < '2026-01-01'"]
}
}
MCP server executes, returns:
{
"data": [...],
"compiled_sql": "...",
"execution_time_ms": 1200
}
Claude formats для user:
“Total revenue by country for 2025:
- USA: $1.5M
- UK: $800K
- …”
Use cases AI + SL
MCP server architecture
dbt-mcp-server:
├── transport layer # JSON-RPC over stdio или HTTP
├── tool implementations # 8 toolsets
├── SL client # connects к dbt SL (через GraphQL внутри)
├── auth & rate limiting # production controls
└── observability # metrics, logs
github.com/dbt-labs/dbt-mcp — Apache 2.0. Можно contribute (новый toolset, лучше handling, etc.).
Choosing the right API
Сценарий API
─────────────────────────────────────────────────
Tableau / Power BI / Looker JDBC
Embedded analytics в web app GraphQL
Jupyter notebook / Python ETL Python SDK
AI assistant (Claude, GPT) MCP server
Data discovery UI GraphQL (для metadata)
Custom SDK (Go, Rust, etc.) GraphQL (HTTP-based)
Real-time dashboards GraphQL + subscriptions
Production patterns
Pattern 1: BI + dbt SL (most common)
BI tool (Tableau) ─JDBC─► dbt Cloud SL ─► Snowflake
↑
semantic_manifest.json
(built daily by dbt)
Single source of truth — semantic_manifest. BI users queries normally через JDBC. dbt SL gives unified definitions.
Pattern 2: Self-hosted SL для data sovereignty
Internal app ─GraphQL─► dbt-sl-server (self-hosted) ─► Postgres / DuckDB
↑
semantic_manifest.json
(generated by dbt parse)
No SaaS dependency. Для financial / healthcare сектора, где data sovereignty critical.
Pattern 3: AI-powered analytics
End user ──► Claude (Anthropic) ─MCP─► dbt MCP server ─SL─► Snowflake
↑
semantic_manifest.json
AI translates natural language -> structured SL queries. Power user-friendly.
Итого
- dbt SL API surface: JDBC (BI tools), GraphQL (web/SDK), Python SDK (data scientists), MCP server (AI agents).
- Apache 2.0 open source с 2025: MetricFlow engine, JDBC driver, GraphQL server, Python SDK, MCP server — all open source. Self-hosting viable.
- dbt Cloud SL — managed offering, продолжает commercial. Open source layer on top — Apache 2.0.
- dbt MCP server — Model Context Protocol implementation. 8 toolsets (list_metrics, query_metric, etc.). Apache 2.0.
- AI integration — Claude/GPT-4/etc. может query metrics через natural language. Это новый interface к analytics.
- Production patterns: BI + dbt SL (most common), self-hosted для data sovereignty, AI-powered для exploratory analytics.
- Risks при AI + SL: misinterpretation, PII exposure, hallucinations, cost runaway, audit compliance. Mitigations: logging, RBAC, cost limits, validate semantic_manifest, system prompts.
- Decision criteria между Cloud / self-hosted: sovereignty, compliance, operational burden, cost, vendor risk, integration needs.
Это завершает модуль 13 — Semantic Layer internals. В модуле 14 переходим к capstone проекту: контрибьют в dbt-core, написать свой adapter, или large-project optimization case.