Learning Platform
Глоссарий Troubleshooting
Урок 07.07 · 90 мин
Продвинутый
LabRecovery PlanSwiftPay WalletBIA MappingDrill PlanAWS BackupSnowflake Time Travel

Введение

6 уроков M6 покрыли BIA fundamentals, 4-level mapping, RTO/RPO derivation, existing BIA reuse, BCP integration, DRP testing. Lab — practical синтез: вы как SwiftRide CDO Office T+9M строите complete recovery plan для SwiftPay wallet payout flow — центральный сценарий Tier-1 на протяжении модуля.

Lab — doc-centric (mandatory). Output — recovery plan Markdown / YAML с 5 секциями: (1) BIA entry на процесс; (2) 4-level mapping на каждый material CDE; (3) recovery patterns matrix; (4) drill plan annual; (5) evidence list. Pre-IPO Big 4 Q3 walk-through оценивает artifact напрямую.

Inputs

M6.1 BIA entry для PROC-SWP-001

По M6.1 — Tier-1; RTO 1h; RPO 5 min; MTD 4h; MTPD 24h; impact curve steep-exponential с regulatory cliffs.

M6.2 4-level mapping

По M6.2 — процессы (4 consuming CDE-SWR-003), приложения (3 microservices), schemas (Snowflake + Aurora), инфраструктура (Snowflake account + Aurora cluster + Confluent).

M6.3 tier matrix

По M6.3 — Tier-1 data RTO 2h target, RPO 5 min target, sync replication pattern.

M4.5 + M5.9 — registry entry + controls catalog

CDE-SWR-003 registry entry (M4.5) + 12-control catalog (M5.9) встроены как reference; recovery plan не заменяет controls catalog; дополняет.

Risk register (M2.6/M2.7)

5 risks идентифицировано на CDE-SWR-003; recovery plan адресует риски типа unavailability (R-DE-006 гипотетически — «Snowflake account outage prevents payout calculation»).

Lab workflow

M6 Lab workflow — 6 шагов к recovery plan

Sequential синтез; inputs из M6.1-M6.6; output recovery plan YAML + drill calendar + self-check.

1. Process tierШаг 1 — assert process tier из существующего BIA (M6.4 reuse)
2. MappingШаг 2 — 4-level mapping по M6.2; идентифицировать все CDE + infrastructure
3. RTO/RPOШаг 3 — вывести data RTO/RPO по tier matrix M6.3; выбрать pattern
4. Recovery patternsШаг 4 — recovery pattern на asset + BCP workarounds по M6.5
5. Drill planШаг 5 — annual drill calendar по M6.6 + плана сбора evidence
6. Self-checkШаг 6 — self-check против 5 категорий критериев

Шаги выполнения

Шаг 1 — Assert process tier из BIA

Использовать существующий BIA (подход M6.4 reuse). Для SwiftPay payout flow:

process_id: PROC-SWP-001
process_name: "Daily driver payout flow"
business_owner: "CFO Office — Carlos (Finance Lead)"
bcm_owner: "Continuity Team — Sarah (Continuity Lead)"

tier_assertion:
  tier: "Tier-1"
  rationale: "Customer-facing real-time financial flow; >$8B GMV annual flowing through; multi-regulator (DORA + PSD2 + GDPR + IRS 1099); MTPD 24h business survival threshold"
  approval:
    risk_committee_date: "2026-02-12"
    cro_signoff: "Carlos (CRO via delegate)"
    last_review: "2026-04-15"

bia_tolerances:
  rto: "1h"
  rpo: "5min"
  mtd: "4h"
  mtpd: "24h"

impact_curve:
  - {t: "1h", financial_usd: 50_000, regulatory: []}
  - {t: "4h", financial_usd: 800_000, regulatory: ["DORA Art. 19 initial-notification"]}
  - {t: "12h", financial_usd: 4_500_000, regulatory: ["DORA intermediate", "BaFin notified"]}
  - {t: "24h", financial_usd: 12_000_000, regulatory: ["MTPD reached"]}

Шаг 2 — 4-level mapping по M6.2

Декомпозировать процесс в приложения → schemas → data → infrastructure.

mapping:
  process: PROC-SWP-001

  applications:
    - id: APP-DRV-EARN-SVC
      name: "driver-earnings-service (Go)"
      owner: "Data Platform · Priya"
      tier: "Tier-1"

    - id: APP-COMM-RECON
      name: "commission-reconciler (Python Airflow DAG)"
      owner: "Data Platform · Priya"
      tier: "Tier-1"

    - id: APP-SWP-API
      name: "swiftpay-api (Kotlin)"
      owner: "Platform Engineering · Yuki"
      tier: "Tier-1"

  schemas:
    - id: SCHEMA-FCT-DRV-EARN
      name: "snowflake.dl_marts.fct_driver_earnings"
      type: "Snowflake dbt mart"
      cdes_hosted: [CDE-SWR-003, CDE-SWR-006]

    - id: SCHEMA-SWP-PAYOUTS
      name: "aurora.swiftpay.payouts"
      type: "Aurora OLTP"
      cdes_hosted: [CDE-SWR-008, CDE-SWR-009]

    - id: SCHEMA-AUDIT-RECON
      name: "snowflake.audit.recon_runs"
      type: "Snowflake audit"
      cdes_hosted: []
      purpose: "Reconciliation evidence log"

  cdes:
    - id: CDE-SWR-003
      column: "gross_earnings_usd"
      role: "primary"
      data_tier_assigned: "Tier-1"
      derivation: "Worst-case across consuming processes [PROC-SWP-001, PROC-RECON-001, PROC-1099-001, PROC-ANALYTICS-007]"

    - id: CDE-SWR-006
      column: "commission_pct"
      role: "primary"
      data_tier_assigned: "Tier-1"
      derivation: "Worst-case across consuming processes [PROC-SWP-001, PROC-RECON-001]"

    - id: CDE-SWR-008
      column: "payout_amount_usd"
      role: "downstream sum"
      data_tier_assigned: "Tier-1"
      derivation: "Direct consumer; inherits Tier-1"

    - id: CDE-SWR-009
      column: "payout_status"
      role: "operational status flag"
      data_tier_assigned: "Tier-1"
      derivation: "Direct consumer; inherits Tier-1"

  infrastructure:
    - id: INFRA-SNOW-SWR-PROD
      name: "Snowflake account swr-prod"
      regions: [EU-WEST-1 (primary), EU-CENTRAL-1, US-EAST-1]
      tier: "Tier-1"

    - id: INFRA-AURORA-SWP
      name: "Aurora swiftpay-cluster"
      regions: [EU-WEST-1 (primary), US-EAST-1 (sync replica)]
      tier: "Tier-1"

    - id: INFRA-S3-EVIDENCE
      name: "S3 swr-cde-evidence (Object Lock)"
      regions: [EU-WEST-1, US-EAST-1]
      tier: "Tier-1 (evidence persistence)"

    - id: INFRA-CONFLUENT-EVENTS
      name: "Confluent Cloud swr-events"
      topics: [driver.earnings.daily, payout.status.events]
      tier: "Tier-1"

Шаг 3 — Вывести data RTO/RPO по tier matrix M6.3

Применить tier defaults; документировать отклонения.

data_tolerances:
  - cde_id: CDE-SWR-003 (gross_earnings_usd)
    process_rto: "1h"  # from PROC-SWP-001
    data_rto_default: "2h"  # M6.3 Tier-1 default
    data_rto_assigned: "2h"
    process_rpo: "5min"
    data_rpo_default: "5min"
    data_rpo_assigned: "5min"
    deviation_rationale: "None — defaults applied"

  - cde_id: CDE-SWR-006 (commission_pct)
    process_rto: "1h"
    data_rto_default: "2h"
    data_rto_assigned: "30min"
    process_rpo: "5min"
    data_rpo_default: "5min"
    data_rpo_assigned: "0 (zero loss)"
    deviation_rationale: "Tighter than default — commission_pct is formula multiplier; any data loss = miscalculated payouts; SOX-grade change management mandates RPO 0; sync replication justified specifically for this CDE."

  - cde_id: CDE-SWR-008 (payout_amount_usd)
    process_rto: "1h"
    data_rto_default: "2h"
    data_rto_assigned: "1h"
    process_rpo: "5min"
    data_rpo_default: "5min"
    data_rpo_assigned: "<1min"
    deviation_rationale: "Tighter than default — Aurora swiftpay-cluster uses sync replica (Global DB), achieves RPO <1min naturally; RTO 1h tracks Aurora failover capability."

  - cde_id: CDE-SWR-009 (payout_status)
    process_rto: "1h"
    data_rto_default: "2h"
    data_rto_assigned: "1h"
    process_rpo: "5min"
    data_rpo_default: "5min"
    data_rpo_assigned: "<1min"
    deviation_rationale: "Same as CDE-SWR-008; co-located Aurora cluster"

Шаг 4 — Recovery patterns + BCP workarounds по M6.5

На каждый asset документировать recovery pattern + business workaround.

recovery_patterns:
  - asset: INFRA-SNOW-SWR-PROD
    pattern: "Hot standby — Snowflake replication group + failover group"
    details: |
      - Replication group `RG_PROD_FAILOVER` replicates DL_MARTS, AUDIT databases.
      - Failover group `FG_SWR_PROD` covers account-level failover (URL aliasing).
      - Primary EU-WEST-1; replicas EU-CENTRAL-1 + US-EAST-1.
      - Failover trigger — account-level health check failure; manual trigger Risk Lead approval.
      - Time Travel 90d + Fail-Safe 7d на critical schemas.
    estimated_rto: "30 min (account alias flip + DNS propagation)"
    estimated_rpo: "<1 min (sync replication for DL_MARTS; near-zero для audit)"
    annual_incremental_cost: "$420K"

  - asset: INFRA-AURORA-SWP
    pattern: "Active-passive sync via Aurora Global Database"
    details: |
      - Primary EU-WEST-1; sync replica US-EAST-1.
      - Aurora Global Database в synchronous mode для writes.
      - Failover via aws rds switchover-global-cluster; estimated <1 min promote + DNS.
      - AWS Backup vault lock 7y compliance mode; cross-region copies daily.
    estimated_rto: "10 min"
    estimated_rpo: "<1 sec"
    annual_incremental_cost: "$1.2M"

  - asset: INFRA-S3-EVIDENCE
    pattern: "Cross-region replication + Object Lock Compliance Mode"
    details: |
      - S3 swr-cde-evidence primary EU-WEST-1; CRR to US-EAST-1.
      - Object Lock Compliance Mode 7y — immutable; tamper-proof even by AWS root.
      - Versioning enabled; MFA delete required (compliance mode bypass not allowed).
    estimated_rto: "5 min (DNS failover)"
    estimated_rpo: "0 (sync writes к both regions via S3 multi-region access points)"
    annual_incremental_cost: "$60K"

  - asset: INFRA-CONFLUENT-EVENTS
    pattern: "Confluent Replicator + Schema Registry mirroring"
    details: |
      - Primary cluster EU-WEST-1; replica EU-CENTRAL-1 + US-EAST-1.
      - Confluent Replicator handles topic-level mirroring; <30s replication lag normal.
      - Schema Registry mirrored separately.
      - Consumer groups have idempotency — replay from replica safe.
    estimated_rto: "30 min (DNS + consumer-group offset reconciliation)"
    estimated_rpo: "<30 sec"
    annual_incremental_cost: "$240K"

bcp_workarounds:
  outage_0_to_30min:
    action: "Suspended-with-comms; in-app banner; payouts queued (Pattern 4)"
    authority: "IC + Customer Success Lead"
    comms_template: "C-1 (in-app + push + status page)"

  outage_30min_to_2h:
    action: "Failover к PayPal commercial agreement (pre-staged, Pattern 3)"
    authority: "CTO + CFO joint sign-off"
    comms_template: "C-2 (in-app + email + status page)"
    rate_limit: "PayPal API throttle ~100 TPS; sufficient для backlog processing"

  outage_2h_to_24h:
    action: "Customer service team manually processes top-100 emergency payouts via secured form (Pattern 1)"
    authority: "VP Customer + CFO joint"
    comms_template: "C-3 (in-app + email + Twitter + press release)"
    SOX_evidence: "Manual entry audit log в SecureForms (TLS + immutable)"

  outage_24h_plus:
    action: "MTPD approached; emergency board call; regulator notification; bank-partner failover"
    authority: "CEO + Board Chair joint"
    comms_template: "C-4 (full external comms — regulator + press + investor relations)"

crisis_production:
  bcbs_239_p5_compliance:
    - "Daily reconciliation continues via batch from Kafka stream backup"
    - "Alternative compute Snowflake DR account SWR-DR (cold standby; warmup 4h)"
    - "Manual aggregation script — Finance team sums payouts from Kafka topic last 24h"
    - "Joint Finance Lead + Risk Lead review before publication during crisis mode"

Шаг 5 — Drill plan annual + evidence

По M6.6, расписать quarterly drills + определить сбор evidence.

drill_plan_2026:
  - quarter: Q1
    drill_id: "DRILL-SWP-2026Q1-001"
    type: "Simulation"
    scope: "CDE-SWR-003 + CDE-SWR-006 failover sandbox"
    duration: "4h"
    success_criteria:
      - "Actual RTO ≤ 60 min"
      - "Actual RPO ≤ 5 min"
      - "Pre/post checksum identical для CDE-SWR-003 and CDE-SWR-006"
      - "All evidence artifacts stored к s3://swr-cde-evidence/drills/"
      - "Deviations log populated"
    status: "COMPLETED (47 min RTO; 0 RPO; 1 deviation closed)"

  - quarter: Q2
    drill_id: "DRILL-SWP-2026Q2-001"
    type: "Simulation (expanded)"
    scope: "All 4 CDEs (003, 006, 008, 009) + Confluent topic failover"
    duration: "6h"
    success_criteria: "RTO ≤ 90 min total; all checksums identical"
    planned_date: "2026-06-15"

  - quarter: Q3
    drill_id: "DRILL-SWP-2026Q3-001"
    type: "Simulation (cross-region full chain)"
    scope: "All 4 CDEs + bank-partner sandbox API + customer comms templates"
    duration: "6h"
    success_criteria: "Full chain end-to-end ≤ 120 min"
    planned_date: "2026-09-10"

  - quarter: Q4
    drill_id: "DRILL-SWP-2026Q4-001"
    type: "Cold simulation (no pre-warming)"
    scope: "Same as Q3 but no pre-staged replication; on-call handles cold"
    duration: "8h"
    success_criteria: "RTO ≤ 240 min (relaxed для cold realism); deviation log thorough"
    planned_date: "2026-12-05"

  - quarter: 2027-Q1
    drill_id: "DRILL-SWP-FULL-2027Q1-001"
    type: "FULL RESTORE — production failover"
    scope: "Production traffic shift к US-EAST-1; primary EU-WEST-1 offline; 24h operation; failback"
    duration: "24h"
    success_criteria: "Customer-facing impact <2 min; recovery RTO ≤ 30 min; zero data loss"
    planned_date: "2027-02-10 (post-IPO listing)"

evidence_collection_per_drill:
  required_artifacts:
    - "PagerDuty incident archive (JSON export)"
    - "Slack channel #incidents-tier1 archive"
    - "AWS RDS failover log"
    - "Snowflake ACCOUNT_USAGE replication events"
    - "Confluent Replicator metrics"
    - "Smoke-test results"
    - "Customer comms artifacts (templates sent or simulated)"
    - "Pre/post checksum baseline для each CDE"
    - "Deviation log per step"
    - "Post-drill review meeting minutes"
    - "Auditor observer notes (Big 4 invited semi-annual minimum)"
  storage: "s3://swr-cde-evidence/drills/{drill_id}/"
  retention: "7 years SOX compliance"
  index_table: "snowflake.audit.drill_index (queryable by drill_id, date, tier)"

Шаг 6 — Self-check

self_check:
  bia_completeness:
    - "[x] Process tier asserted с rationale + Risk Committee sign-off date"
    - "[x] Impact curve >= 4 timepoints (1h, 4h, 12h, 24h minimum)"
    - "[x] Financial + operational + reputational + regulatory dimensions per timepoint"
    - "[x] MTD ≠ MTPD; RTO < MTD; RTO ≠ MTPD"
    - "[x] No linear-curve assumption — explicit cliffs documented"

  mapping_completeness:
    - "[x] All 4 levels populated (process, applications, schemas, data, infrastructure)"
    - "[x] Cardinality documented (multi-process CDE inheritance — worst-case rule)"
    - "[x] Infrastructure layer included (Snowflake + Aurora + S3 + Confluent)"
    - "[x] All material CDE flagged + linked к registry (M4.5)"

  rto_rpo_coherence:
    - "[x] Data RTO < process RTO (buffer)"
    - "[x] Data RPO ≤ process RPO"
    - "[x] Deviations from M6.3 defaults justified per-CDE"
    - "[x] Recovery patterns aligned с tier (sync для Tier-1 ≠ daily backup)"
    - "[x] Mismatch checks — RTO + RPO в одном tier band"

  bcp_integration:
    - "[x] Manual workarounds documented per outage duration"
    - "[x] Communication trees with explicit time bounds per recipient"
    - "[x] Regulator notification timing aligned с DORA Art. 19 + PSD2 + GDPR Art. 33"
    - "[x] Decision rights documented (per role + delegate + escalation)"
    - "[x] Crisis production capability defined (BCBS 239 P5)"

  drill_governance:
    - "[x] Annual drill calendar Q1-Q4 + multi-year horizon (full restore 2027-Q1)"
    - "[x] Success criteria explicit per drill (RTO, RPO, checksum, evidence)"
    - "[x] Evidence collection list comprehensive + 7y retention"
    - "[x] Cold drill scheduled at least once annually"
    - "[x] Auditor observer invited semi-annual minimum"
    - "[x] Deviation closure SLA defined (30 days Tier-1)"

  audit_defensibility:
    - "[x] Full walk-through possible from BIA process tier → CDE → infrastructure → recovery pattern → drill outcome"
    - "[x] All recovery decisions cost-justified против BIA impact-over-time curve"
    - "[x] All deviations from tier defaults документированы с rationale"
    - "[x] Last drill outcome reflected в plan (actual RTO 47 min vs target 60 min)"
    - "[x] Regulator-specific deadlines explicitly listed (DORA Art. 19, PSD2, BaFin etc.)"

Self-check criteria — детально

Criterion 1 — Tier coherence

Process tier распространяется корректно к data tier через правило worst-case. Cross-process CDE наследует highest tier. Нет silent demotion (например, Tier-1 CDE указан как Tier-2 data) — помечено per derivation.

Criterion 2 — Математическая консистентность RTO/RPO

RTO ≤ MTD × 0.7 (буфер); MTD ≤ MTPD × 0.5 (multi-level buffer). Data RTO < process RTO. Data RPO = process RPO, если не оправдано tighter. Нет RTO = RPO кроме случайного совпадения.

Criterion 3 — Cost-impact defensibility

Sync replication выбрана только там, где BIA impact-curve оправдывает (импакт час-1 > X threshold). Async достаточно там, где импакт час-1 &lt; X. Cost-benefit math защитима перед Risk Committee.

Criterion 4 — Регуляторное выравнивание

DORA Art. 11 RTO/RPO документированы для critical functions. DORA Art. 19 notification timing встроен в comms tree. BCBS 239 P5 crisis production адресован. PCI-DSS scope соблюдён (Aurora swiftpay cluster). GDPR Art. 33 учтён (PII recovery).

Criterion 5 — Audit-readiness evidence

Drill artifacts comprehensive; 7y retention; queryable index; cross-referenced к реестру. Walk-through готов — auditor выбирает PROC-SWP-001 → каскад видим → recovery patterns видимы → последний drill outcome видим.

Opt-in tooling lab

Параллельно с doc lab — фактически настроить AWS Backup + Snowflake recovery features. Configuration sketch.

AWS Backup configuration (Tier-1 Aurora)

AWS Backupv2026 GA2026-05
# aws-backup-plan-swiftpay-tier1.yaml
BackupPlan:
  BackupPlanName: "SwiftPay-Tier1-CDE-Backup"
  Rules:
    - RuleName: "hourly-incremental"
      TargetBackupVault: "swr-cde-evidence-vault"  # vault lock applied
      ScheduleExpression: "cron(0 * * * ? *)"  # hourly
      StartWindowMinutes: 60
      CompletionWindowMinutes: 120
      Lifecycle:
        DeleteAfterDays: 90
      EnableContinuousBackup: true  # Aurora PITR

    - RuleName: "daily-full-cross-region"
      TargetBackupVault: "swr-cde-evidence-vault"
      ScheduleExpression: "cron(0 1 * * ? *)"  # 01:00 UTC daily
      Lifecycle:
        MoveToColdStorageAfterDays: 30
        DeleteAfterDays: 2555  # 7y SOX
      CopyActions:
        - DestinationBackupVaultArn: "arn:aws:backup:us-east-1:...:backup-vault/swr-cde-evidence-vault-replica"

# Vault lock configuration:
BackupVault:
  BackupVaultName: "swr-cde-evidence-vault"
  LockConfiguration:
    MinRetentionDays: 2555
    MaxRetentionDays: 3650
    ChangeableForDays: 3  # cooling-off period; afterwards immutable

Используемые ключевые фичи:

  • Vault lock compliance mode — даже AWS root не может удалить до MinRetentionDays.
  • PITR Aurora — point-in-time recovery до 35 дней; дополняет Aurora Global Database sync.
  • Cross-region copies — автоматически к US-EAST-1 vault для DR.
  • Glacier-tier transition — после 30 дней, экономия ~80% storage cost; 7y retention SOX.

Snowflake Time Travel + Fail-Safe (Tier-1)

Snowflakev2026 release notes2026-05
-- Snowflake Time Travel configuration для Tier-1 CDE schemas
ALTER DATABASE DL_MARTS
  SET DATA_RETENTION_TIME_IN_DAYS = 90;  -- Time Travel max for Enterprise edition

-- Per-schema explicit setting (matches database level)
ALTER SCHEMA DL_MARTS.FINANCE
  SET DATA_RETENTION_TIME_IN_DAYS = 90;

-- Per-table override for highest-criticality CDE
ALTER TABLE DL_MARTS.FINANCE.FCT_DRIVER_EARNINGS
  SET DATA_RETENTION_TIME_IN_DAYS = 90;

-- Replication group setup (failover capability)
CREATE REPLICATION GROUP RG_PROD_FAILOVER
  OBJECT_TYPES = DATABASES, SHARES
  ALLOWED_DATABASES = DL_MARTS, AUDIT
  ALLOWED_ACCOUNTS = SWR_PROD_EU_CENTRAL, SWR_PROD_US_EAST
  REPLICATION_SCHEDULE = '10 MINUTE';  -- replication every 10 min

CREATE FAILOVER GROUP FG_SWR_PROD
  OBJECT_TYPES = DATABASES, SHARES, ROLES, USERS, WAREHOUSES
  ALLOWED_DATABASES = DL_MARTS, AUDIT
  ALLOWED_ACCOUNTS = SWR_PROD_EU_CENTRAL, SWR_PROD_US_EAST
  REPLICATION_SCHEDULE = '5 MINUTE';

-- Failover invocation (DR drill or real incident):
-- ALTER FAILOVER GROUP FG_SWR_PROD PRIMARY;  -- run on target account

Используемые ключевые фичи:

  • Time Travel 90d — восстановить к любому prior state до 90 дней; Enterprise edition max.
  • Fail-Safe 7d additional — Snowflake-managed recovery после исчерпания Time Travel; restore через Support ticket.
  • Replication group — explicit database-level репликация; multi-region.
  • Failover group — account-level failover; включает users, roles, warehouses (не только данные).

Ограничения:

  • Fail-Safe is Snowflake-managed — student-team не может self-service restore; должны engage Snowflake Support; SLA ~24h. Не подходит для Tier-1 primary recovery; supplementary.
  • Replication имеет cost — replicated objects incur storage cost в target region; warehouse compute costs для replication tasks.

Confluent Replicator (Tier-1 events)

Confluent Cloud Replicatorv20262026-05
# confluent-replicator-config-swp-events.yaml
ReplicatorTask:
  name: "swp-events-replicator-eu-west-to-us-east"
  source:
    cluster: "swr-events-eu-west-1"
    bootstrap_servers: "..."
    schema_registry: "https://psrc-..."
  destination:
    cluster: "swr-events-us-east-1"
    bootstrap_servers: "..."
    schema_registry: "https://psrc-..."
  topics:
    - name: "driver.earnings.daily"
      replication_factor: 3
      partitions: 12
    - name: "payout.status.events"
      replication_factor: 3
      partitions: 8
  consumer_group_offset_translation: enabled  # tracks downstream consumer offsets across clusters
  replication_lag_threshold_ms: 30000  # alert if lag > 30s

Типичные ошибки

Mistake 1 — Recovery plan игнорирует BIA

Pattern: recovery plan построен bottom-up («у нас Snowflake replication, поэтому RTO 30 min для всех CDE»); BIA process tier не упомянут.

Fix: вывод recovery plan начинается с BIA; tier наследуется вниз; выбор pattern следует требованиям tier.

Mistake 2 — Recovery costs недооценены

Pattern: «sync replication enabled», но не забюджетировано; CFO видит AWS bill spike; pulls funding mid-cycle.

Fix: cost section на recovery pattern; annual incremental cost суммирован; pre-approved Finance + Risk Committee; budget locked.

Mistake 3 — Drill calendar без success criteria

Pattern: «quarterly drill», но без specific success metrics. Drill «passes» based on engineering team’s verdict.

Fix: explicit success criteria на drill (RTO, RPO, checksum, evidence completeness, deviation log); failure-to-meet criterion триггерит remediation cycle.

Mistake 4 — Crisis production conflated с normal

Pattern: recovery plan адресует normal-mode RTO/RPO; crisis production (alternative compute, manual aggregation) игнорируется.

Fix: отдельная секция на crisis production (BCBS 239 P5 alignment); drill-tested минимум annual.

Mistake 5 — Evidence не queryable

Pattern: drill artifacts хранятся S3; нет index; auditor запрашивает «покажите мне последний Tier-1 drill» — engineer вручную ищет в S3 folder structure.

Fix: snowflake.audit.drill_index queryable table; columns (drill_id, tier, date, scope, status); 7y retention; cross-referenced к S3 evidence keys.

Mistake 6 — Plan никогда не обновляется

Pattern: recovery plan опубликован Q1; никогда не обновляется. SwiftRide добавляет new BU (SwiftAds Q3); план не отражает.

Fix: annual refresh встроен в cadence Risk Committee; ad-hoc trigger на значительное изменение (новый продукт, новый регион, regulatory change); cross-reference к CDE registry refresh.

Мост к M7 (evidence) и M8 (operating)

M7 — Evidence & attestation. Recovery plan artifact (этот lab) — вход для M7 attestation cycle. Quarterly evidence package включает: BIA refresh log, drill outcomes на квартал, deviation closure status, regulatory deadline compliance, recovery cost actual vs budget. M7 показывает, как evidence упакован для quarterly CDO certification + annual auditor walk-through.

M8 — Operating model. Recovery plan governance — как cadence встроен организационно. M8 покрывает: Risk Committee dashboard, включая recovery posture, joint review CDE-tier vs application-tier annually, SDLC gates, обеспечивающие, что новый CDE автоматически получает recovery plan assignment, vendor exit strategies (DORA Art. 28 — CTPP).

Резюме

  • Lab синтезирует M6.1-M6.6 — BIA → mapping → допуски → patterns → BCP → drill plan — в единый recovery plan artifact на Tier-1 процесс.
  • 6 шагов: assert process tier; 4-level mapping; вывести data RTO/RPO; recovery patterns + BCP workarounds; drill plan + evidence; self-check по 5 категориям criterion.
  • SwiftPay wallet — primary сценарий. Tier-1; RTO 1h; RPO 5 min; impact 4.5Mчас12;syncreplicationоправдана(4.5M час-12; sync replication оправдана (2.4M annual incremental).
  • Self-check criteria: tier coherence; математическая консистентность RTO/RPO; cost-impact defensibility; regulatory alignment; evidence audit-readiness.
  • Opt-in tooling — AWS Backup vault lock 7y + Snowflake Time Travel/Fail-Safe + replication/failover groups + Confluent Replicator. Configuration sketches.
  • Мост: artifact питает M7 evidence cycle; governance встроена в M8 operating model.

Модуль M6 завершён. Recovery plan для SwiftPay wallet — practical artifact, audit-defensible, интегрирован с CDE registry + controls catalog + risk register. M7 далее — evidence & attestation для material weakness mitigation.

DORA CTPP — vendor risk в recovery plan KPI программы — мониторинг recovery readiness

Проверьте понимание

Результат: 0 из 0
Прикладной
Вопрос 1 из 4. M6 Lab — recovery plan для SwiftPay wallet (PROC-SWP-001 Tier-1). 6 steps + 5 sections. Какой sequence шагов correct + почему?

Закончили урок?

Отметьте его как пройденный, чтобы отслеживать свой прогресс

Войдите чтобы оценить урок

Прогресс модуля
0 из 7