ETL vs ELT vs Zero-ETL
Эволюция pipeline-моделей
Data pipeline прошёл три поколения:
1. ETL (классический)
2. ELT (современный)
3. Zero-ETL (emerging)
ETL: Transform Before Load
ETL Flow:
Source DB → Extract (read data)
→ Transform (clean, validate, reshape in separate engine)
→ Load (write to DWH in final form)
Пример:
PostgreSQL → Informatica ETL Server → clean nulls, join tables,
apply business rules
→ Teradata DWH
Когда ETL
| Сценарий | Почему ETL |
|---|---|
| PII compliance | Mask/remove PII BEFORE loading to DWH |
| Legacy DWH | On-premise DWH без мощного compute (Teradata, Oracle) |
| Data contracts | Strict schema enforcement before storage |
| Bandwidth constraints | Transform reduces data volume before transfer |
ELT: Load Raw, Transform In-Place
ELT Flow:
Source DB → Extract → Load raw data to DWH/Lake
→ Transform inside DWH (SQL, dbt)
Пример:
PostgreSQL → Fivetran (extract+load raw) → Snowflake (raw tables)
→ dbt (SQL transforms in Snowflake) → analytics tables
Когда ELT
| Сценарий | Почему ELT |
|---|---|
| Cloud DWH | Snowflake/BigQuery have massive compute for transforms |
| Data exploration | Raw data available for ad-hoc analysis |
| Speed | Load fast, transform later — faster data availability |
| Iteration | Change transforms without re-extracting |
dbt: ключевой инструмент ELT
dbt (Data Build Tool):
- SQL-based transformations
- Version controlled (Git)
- Testing built-in (schema tests, data tests)
- Documentation auto-generated
- Lineage tracking
dbt model example:
-- models/marts/dim_customers.sql
SELECT
c.customer_id,
c.name,
c.email,
COUNT(o.order_id) as total_orders,
SUM(o.amount) as lifetime_value
FROM {{ ref('stg_customers') }} c
LEFT JOIN {{ ref('stg_orders') }} o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3
Zero-ETL: Direct Integration
Zero-ETL:
Source → native connector → DWH (no separate pipeline)
Примеры:
Amazon Aurora → zero-ETL → Redshift (AWS native)
AlloyDB → zero-ETL → BigQuery (GCP native)
Ограничения:
- Vendor lock-in (only within cloud provider)
- Limited to supported source/destination pairs
- Less control over transformation timing
Comparison Matrix
Aspect
ETL
ELT
Zero-ETL
TIP
Современный default: ELT
Для большинства новых проектов ELT — default choice. Fivetran/Airbyte для extract+load, dbt для transform. Raw data preserved for exploration. Zero-ETL — дополнение для specific sources, не замена.