Performance toolchain summary

Module 12 покрыл stdlib performance toolchain: timeit / cProfile + pstats / tracemalloc / dis / asyncio / GIL + threading + multiprocessing. В production часто используются 3rd-party инструменты — лучше overhead profile, лучше output formats, лучше attach modes. Этот урок — recap stdlib + brief mention 3rd-party.

В этом уроке:

M12 recap table — каждый stdlib tool → когда использовать.
py-spy — sampling profiler для production.
line_profiler — line-by-line profiling.
memray — Bloomberg memory profiler (better than tracemalloc для production).
austin — frame-by-frame sampling profiler.
Forward-link Phase 70 — troubleshooting KB sources (Pitfalls 35-42).
Optional Run-on-Your-Machine #5 — py-spy install demo.

M12 recap table — stdlib toolchain

Tool	Question answered	Pyodide-runnable	Run-on-Your-Machine?
`timeit` / `time.perf_counter`	«Как быстро X?» — microbenchmark	Bounded (timing noisy в browser — Pitfall 38)	Yes — урок 01 timing baseline
`cProfile` + `pstats`	«Где время в программе?» — function-level profile	Pre-captured stats parsing OK	Yes — урок 02 host cProfile run
`tracemalloc`	«Где аллокации?» — memory profiling + leak detection	Bounded examples (small dict/list)	Yes — урок 03 production-scale demo
`dis`	«Что компилируется в bytecode?» — opcode introspection	Yes — `compile + dis.get_instructions` (Pattern 4)	Yes — урок 04 visual `dis.dis` output
`asyncio`	«Как handle 1000+ concurrent I/O ops?»	NO — Pitfall 41 `asyncio.run` conflict	Conceptual MDX prose only
GIL + threading + multiprocessing	«I/O или CPU? threads или processes?»	NO — both modules forbidden Phase 65	Conceptual MDX prose only

Decision matrix: какой инструмент для какой задачи

START

Microbenchmark одного выражения

Профилирование всей программы

Утечка памяти / непрерывный рост

Анализ bytecode

I/O-конкурентность 1000+

I/O-конкурентность ≤100

CPU-параллелизм

`py-spy` — sampling profiler для production

py-spy — sampling profiler написан в Rust. Ключевые свойства:

Attach к running process через /proc Linux — без modifying app code.
Low overhead (<2% slowdown) — production-safe.
Output formats — flame graph SVG, speedscope JSON, top-like live view.
Multi-process / multi-threaded — handles forks correctly.

pip install 'py-spy>=0.3'

# Live top-like view
py-spy top --pid 12345

# Generate flame graph SVG
py-spy record -o profile.svg --pid 12345 -d 60

# Profile script directly
py-spy record -o profile.svg -- python my_app.py

Comparison vs cProfile:

Aspect	`cProfile`	`py-spy`
Approach	Deterministic (every call)	Sampling (~100 Hz default)
Overhead	~30% slowdown	<2%
Code modification	Wrap with `cProfile.Profile()`	None — attach via PID
Output	Text via `pstats`	Flame graph SVG / speedscope
Production-safe	NO	YES

Decision rule: dev / pre-prod profiling → cProfile; production diagnostics → py-spy.

`line_profiler` — line-by-line profiling

line_profiler — profiles каждую строку функции (vs cProfile per-function). Useful когда нужно identified specific slow line внутри hot function.

pip install 'line_profiler>=4.0'

# Decorate function с @profile
@profile
def slow_func(items):
    result = []                         # Line 1
    for item in items:                  # Line 2
        if expensive_check(item):       # Line 3 — hot line!
            result.append(transform(item))  # Line 4
    return result                        # Line 5

kernprof -l -v my_script.py             # runs + dumps timing per line

Output показывает per-line % of time + hits + per_hit time. Granularity больше чем cProfile (line vs function).

Trade-off: higher overhead (~50% slowdown — instruments каждую строку) — dev only, не production.

`memray` — Bloomberg memory profiler

memray (Bloomberg, 2022) — memory profiler для production. Преимущества над tracemalloc:

Native instrumentation — C-level allocator hook, lower overhead.
Structured output — tree visualization, flame graphs, allocator breakdown.
Live tracking mode — UI обновляется real-time.
Native code support — tracks allocations from C-extensions (numpy, etc.) — что tracemalloc не делает.

pip install 'memray>=1.10'

# Profile script
memray run -o profile.bin my_app.py

# Generate flame graph HTML
memray flamegraph profile.bin

# Live mode
memray run --live my_app.py

Decision rule: dev / quick diagnostics — tracemalloc (stdlib, no install); production memory analysis — memray (better tooling, C-level tracking).

`austin` — frame-by-frame sampling profiler

austin — sampling profiler с eBPF-like approach. Frame-by-frame stack walking via OS ptrace / dtrace.

pip install 'austin-dist>=3.0'

austin -i 100 python my_app.py      # 100 Hz sampling

Output: call stack samples — combined для flame graphs. Production-safe (low overhead). Less popular than py-spy, но similar capabilities.

Forward-link Phase 70 — troubleshooting KB

Phase 70 (Launch Polish) содержит troubleshooting knowledge base (ASMT-07 requirement) — KB articles по common Python production issues. Pitfalls из Module 12 — direct source material:

Pitfall 38 (Pyodide cProfile timing unreliable) → KB «Why does my Pyodide profiler show wrong numbers?»
Pitfall 39 (tracemalloc overhead не stop’d) → KB «Why is my Python process slow after deploy?»
Pitfall 40 (dis opcode names version-sensitive) → KB «Why do my bytecode tests fail after Python upgrade?»
Pitfall 41 (Pyodide asyncio.run conflict) → KB «Why does asyncio.run raise RuntimeError in browser?»
Pitfall 42 (GIL release rules) → KB «Why does threading not speed up my CPU loop?»

Phase 70 KB articles cite back к M12 lessons как primary source. Cross-module learning loop completes.

Optional Run-on-Your-Machine #5 — `py-spy` install demo

TIP

Run-on-Your-Machine: py-spy install + production-style profiling

Установите py-spy (НЕ stdlib):

pip install 'py-spy>=0.3'
python --version  # >=3.11

Создайте файл slow_app.py (long-running):

import time

def hot_func():
    return sum(i*i for i in range(10_000_000))

def cold_func():
    time.sleep(0.5)

def main():
    while True:
        hot_func()                   # CPU-bound dominant
        cold_func()                  # I/O sleep — minor

if __name__ == '__main__':
    main()

Запустите в одном терминале:

python slow_app.py &
APP_PID=$!
echo "App running with PID $APP_PID"

Profile через py-spy в другом терминале (no modification к app):

# Live top-like view (Ctrl+C to exit)
py-spy top --pid $APP_PID

# OR record flame graph 30 seconds
py-spy record -o profile.svg --pid $APP_PID -d 30

# Open profile.svg в browser

Ожидаемый flame graph — hot_func доминирует (>95% time); cold_func thin slice (sleep). main — root frame.

Key insight: profiling без code modification — production teams могут profile any running Python service по PID. Это и есть преимущество sampling профилирования над cProfile (требует wrap кода).

Version pin py-spy>=0.3 (Pitfall 32 — Phase 69 baseline).

Cross-course → Spark 12/04 cost-optimization — production cost optimization parallel: profiling Spark workers через JFR / async-profiler без modifying job code — same pattern «attach + sample, no instrumentation».

M12 finish — bridge к M13

M12 закрыл: stdlib performance toolchain — измерение (timeit / perf_counter), профилирование (cProfile / pstats), память (tracemalloc / sys.getsizeof / gc), bytecode (dis), concurrency (asyncio + GIL + threading + multiprocessing). Cross-course мостики — Spark / DataFusion / ClickHouse architectural patterns.

M13 (next plan) — Packaging & Environment: pyproject.toml + pip / uv / rye / poetry; requirements.txt + lockfiles; PEP 440 specifiers; PEP 735 dependency-groups. Performance lessons из M12 carry — pip install cost matters, uv ~10× faster than pip via Rust → cross-link к dependency management performance.

Pragmatic-DEEP принцип: stdlib даёт достаточно для 80% production performance work. 3rd-party tools (py-spy / memray) — для последних 20% — production-grade scenarios. M12 преподаёт основу; toolchain — advanced extensions.

Cite py-spy GitHub + line_profiler + memray docs + austin GitHub.

Performance toolchain summary

M12 recap table — stdlib toolchain

py-spy — sampling profiler для production

line_profiler — line-by-line profiling

memray — Bloomberg memory profiler

austin — frame-by-frame sampling profiler

Forward-link Phase 70 — troubleshooting KB

Optional Run-on-Your-Machine #5 — py-spy install demo

M12 finish — bridge к M13

Закончили урок?

`py-spy` — sampling profiler для production

`line_profiler` — line-by-line profiling

`memray` — Bloomberg memory profiler

`austin` — frame-by-frame sampling profiler

Optional Run-on-Your-Machine #5 — `py-spy` install demo