Big Data Engineer
Карьерный путь инженера больших данных — от основ до продвинутых технологий
ОбязательноЖелательноАльтернатива
0 / 65 (0%)
100%
CS Foundations
Algorithms & Data StructuresB
Sorting Algorithms
Graph Algorithms
Hash Tables
Trees & Heaps
Time & Space Complexity
Computer NetworksB
Operating SystemsB
Processes & Threads
Memory Management
File Systems
I/O & Scheduling
Linux & Command LineB
Bash Basics
Text Processing (grep, awk, sed)
Users & Permissions
Package Management
Distributed Systems FundamentalsI
CAP Theorem
Consensus Algorithms (Raft, Paxos)
Replication & Partitioning
Consistency Models
Databases & Modeling
SQL FundamentalsB
JOINs & Set Operations
Window Functions
CTEs & Subqueries
GROUP BY & Aggregation
Query Optimization
Relational DatabasesB
ACID Transactions
Normalization (1NF-3NF)
PostgreSQL Administration
Stored Procedures & Functions
NoSQL DatabasesI
MongoDB (Document Store)
Apache Cassandra (Wide-Column)
Apache HBase
Redis (Key-Value)
Data ModelingI
Dimensional Modeling (Star, Snowflake)
Normalized Models (3NF)
Denormalized Models
ER Diagrams & Schema Design
Database Indexing & PerformanceI
Graph DatabasesA
Time-Series DatabasesI
Warehousing & Architecture
Data Warehouse ConceptsI
DWH Layers (Raw, Cleansed, Curated)
ETL vs ELT Approaches
Staging Areas
Data Marts
Dimensional Modeling (Kimball & Inmon)I
Star Schema
Snowflake Schema
Kimball Methodology
Inmon Methodology
Data Lake ArchitectureI
Schema-on-Read vs Schema-on-Write
Lake Zones (Landing, Raw, Curated)
Object Storage (S3, GCS, ADLS)
Data Catalog & Discovery
Data LakehouseI
Data MeshA
OLAP SystemsI
ClickHouse
Apache Druid
Columnar Storage
Materialized Views
Slowly Changing DimensionsI
ETL/ELT & Pipelines
ELT vs ETL ConceptsB
ETL Pattern
ELT Pattern
Modern ELT-First Approach
Idempotency & Retry Logic
Apache AirflowI
DAGs & Operators
Scheduling & Triggers
Connections & Hooks
Best Practices & Testing
Batch Processing PatternsI
Incremental Loads (Delta)
Full Refresh
SCD Handling in Pipelines
Backfill Strategies
Change Data CaptureI
CDC Fundamentals
Log-Based CDC (WAL/Binlog)
Debezium Architecture
CDC Patterns (Outbox, Event Sourcing)
Stream Processing FundamentalsI
Event Time vs Processing Time
Windowing (Tumbling, Sliding, Session)
Watermarks & Late Data
Exactly-Once Semantics
Data Pipeline MonitoringI
dbt (Data Build Tool)I
Luigi / PrefectI
Big Data Processing
Hadoop EcosystemI
HDFS
YARN
MapReduce
Apache Hive
Apache HBase
Apache SparkI
Spark Core & RDDs
Spark SQL & DataFrames
DataFrame API
Structured Streaming
MLlib
Apache KafkaI
Producers & Consumers
Kafka Connect
Kafka Streams
Schema Registry
Partitioning & Replication
Apache FlinkA
Apache BeamA
Python for Data EngineeringB
Pandas
NumPy
PySpark
Data Serialization (Avro, Parquet, JSON)
Java / ScalaI
Data Formats & SerializationB
Apache Parquet
Apache Avro
Apache ORC
Protocol Buffers
Cloud Platforms
AWS Data ServicesI
Amazon S3
Amazon Redshift
AWS Glue
Amazon EMR
Amazon Kinesis
Amazon Athena
GCP Data ServicesI
BigQuery
Google Dataflow
Google Dataproc
Google Pub/Sub
Google Cloud Storage
Azure Data ServicesI
Cloud Storage & Object StoresB
S3, GCS, ADLS Comparison
MinIO (Self-Hosted)
Partitioning Strategies
Lifecycle Policies & Tiering
Serverless Data ProcessingI
Managed Kafka & StreamingI
Cloud Cost OptimizationI
DevOps for Data
DockerB
Dockerfile & Images
Docker Compose
Container Networking
KubernetesI
CI/CD for Data PipelinesI
Terraform / IaCI
Shell Scripting & AutomationB
Bash Scripting
Cron & Scheduling
Automation Patterns
CLI Tools (jq, awk, sed)
Monitoring & ObservabilityI
Prometheus Metrics
Grafana Dashboards
Centralized Logging (ELK)
Alerting & SLA Monitoring
Git & Version ControlB
Branching Strategies
Code Review for Data Teams
Git Workflows (GitFlow, Trunk)
Monorepo vs Polyrepo
Data Pipeline TestingI
Data Governance
Data Governance FundamentalsI
DMBOK Framework
Governance Organization
Data Stewardship
Governance Charter
Data QualityI
Quality Dimensions
Data Profiling
Great Expectations
dbt Tests
Metadata ManagementI
Data Catalogs (DataHub, Atlas)
Business Glossary
Data Lineage
Apache Atlas
Data Privacy & ComplianceI
GDPR & Data Protection Laws
PII Detection
Data Masking & Anonymization
Consent Management
Data Security & Access ControlI
RBAC (Role-Based Access)
ABAC (Attribute-Based Access)
Column-Level Security
Audit Logging
Data Architecture & ModelingA
Data Lifecycle ManagementI
Advanced Topics
Machine Learning for Data EngineersA
Real-Time AnalyticsA
Data Mesh ArchitectureA
DataOps & AutomationA
Rust for Data SystemsA
Graph AnalyticsA
Data Contracts & Schema EvolutionA
Career DevelopmentI