Performance toolchain summary
Module 12 покрыл stdlib performance toolchain: timeit / cProfile + pstats / tracemalloc / dis / asyncio / GIL + threading + multiprocessing. В production часто используются 3rd-party инструменты — лучше overhead profile, лучше output formats, лучше attach modes. Этот урок — recap stdlib + brief mention 3rd-party.
В этом уроке:
- M12 recap table — каждый stdlib tool → когда использовать.
py-spy— sampling profiler для production.line_profiler— line-by-line profiling.memray— Bloomberg memory profiler (better than tracemalloc для production).austin— frame-by-frame sampling profiler.- Forward-link Phase 70 — troubleshooting KB sources (Pitfalls 35-42).
- Optional Run-on-Your-Machine #5 — py-spy install demo.
M12 recap table — stdlib toolchain
| Tool | Question answered | Pyodide-runnable | Run-on-Your-Machine? |
|---|---|---|---|
timeit / time.perf_counter | «Как быстро X?» — microbenchmark | Bounded (timing noisy в browser — Pitfall 38) | Yes — урок 01 timing baseline |
cProfile + pstats | «Где время в программе?» — function-level profile | Pre-captured stats parsing OK | Yes — урок 02 host cProfile run |
tracemalloc | «Где аллокации?» — memory profiling + leak detection | Bounded examples (small dict/list) | Yes — урок 03 production-scale demo |
dis | «Что компилируется в bytecode?» — opcode introspection | Yes — compile + dis.get_instructions (Pattern 4) | Yes — урок 04 visual dis.dis output |
asyncio | «Как handle 1000+ concurrent I/O ops?» | NO — Pitfall 41 asyncio.run conflict | Conceptual MDX prose only |
| GIL + threading + multiprocessing | «I/O или CPU? threads или processes?» | NO — both modules forbidden Phase 65 | Conceptual MDX prose only |
py-spy — sampling profiler для production
py-spy — sampling profiler написан в Rust. Ключевые свойства:
- Attach к running process через
/procLinux — без modifying app code. - Low overhead (<2% slowdown) — production-safe.
- Output formats — flame graph SVG, speedscope JSON, top-like live view.
- Multi-process / multi-threaded — handles forks correctly.
pip install 'py-spy>=0.3'
# Live top-like view
py-spy top --pid 12345
# Generate flame graph SVG
py-spy record -o profile.svg --pid 12345 -d 60
# Profile script directly
py-spy record -o profile.svg -- python my_app.py
Comparison vs cProfile:
| Aspect | cProfile | py-spy |
|---|---|---|
| Approach | Deterministic (every call) | Sampling (~100 Hz default) |
| Overhead | ~30% slowdown | <2% |
| Code modification | Wrap with cProfile.Profile() | None — attach via PID |
| Output | Text via pstats | Flame graph SVG / speedscope |
| Production-safe | NO | YES |
Decision rule: dev / pre-prod profiling → cProfile; production diagnostics → py-spy.
line_profiler — line-by-line profiling
line_profiler — profiles каждую строку функции (vs cProfile per-function). Useful когда нужно identified specific slow line внутри hot function.
pip install 'line_profiler>=4.0'
# Decorate function с @profile
@profile
def slow_func(items):
result = [] # Line 1
for item in items: # Line 2
if expensive_check(item): # Line 3 — hot line!
result.append(transform(item)) # Line 4
return result # Line 5
kernprof -l -v my_script.py # runs + dumps timing per line
Output показывает per-line % of time + hits + per_hit time. Granularity больше чем cProfile (line vs function).
Trade-off: higher overhead (~50% slowdown — instruments каждую строку) — dev only, не production.
memray — Bloomberg memory profiler
memray (Bloomberg, 2022) — memory profiler для production. Преимущества над tracemalloc:
- Native instrumentation — C-level allocator hook, lower overhead.
- Structured output — tree visualization, flame graphs, allocator breakdown.
- Live tracking mode — UI обновляется real-time.
- Native code support — tracks allocations from C-extensions (numpy, etc.) — что tracemalloc не делает.
pip install 'memray>=1.10'
# Profile script
memray run -o profile.bin my_app.py
# Generate flame graph HTML
memray flamegraph profile.bin
# Live mode
memray run --live my_app.py
Decision rule: dev / quick diagnostics — tracemalloc (stdlib, no install); production memory analysis — memray (better tooling, C-level tracking).
austin — frame-by-frame sampling profiler
austin — sampling profiler с eBPF-like approach. Frame-by-frame stack walking via OS ptrace / dtrace.
pip install 'austin-dist>=3.0'
austin -i 100 python my_app.py # 100 Hz sampling
Output: call stack samples — combined для flame graphs. Production-safe (low overhead). Less popular than py-spy, но similar capabilities.
Forward-link Phase 70 — troubleshooting KB
Phase 70 (Launch Polish) содержит troubleshooting knowledge base (ASMT-07 requirement) — KB articles по common Python production issues. Pitfalls из Module 12 — direct source material:
- Pitfall 38 (Pyodide cProfile timing unreliable) → KB «Why does my Pyodide profiler show wrong numbers?»
- Pitfall 39 (tracemalloc overhead не stop’d) → KB «Why is my Python process slow after deploy?»
- Pitfall 40 (dis opcode names version-sensitive) → KB «Why do my bytecode tests fail after Python upgrade?»
- Pitfall 41 (Pyodide asyncio.run conflict) → KB «Why does asyncio.run raise RuntimeError in browser?»
- Pitfall 42 (GIL release rules) → KB «Why does threading not speed up my CPU loop?»
Phase 70 KB articles cite back к M12 lessons как primary source. Cross-module learning loop completes.
Optional Run-on-Your-Machine #5 — py-spy install demo
Run-on-Your-Machine: py-spy install + production-style profiling
Установите py-spy (НЕ stdlib):
pip install 'py-spy>=0.3'
python --version # >=3.11Создайте файл slow_app.py (long-running):
import time
def hot_func():
return sum(i*i for i in range(10_000_000))
def cold_func():
time.sleep(0.5)
def main():
while True:
hot_func() # CPU-bound dominant
cold_func() # I/O sleep — minor
if __name__ == '__main__':
main()Запустите в одном терминале:
python slow_app.py &
APP_PID=$!
echo "App running with PID $APP_PID"Profile через py-spy в другом терминале (no modification к app):
# Live top-like view (Ctrl+C to exit)
py-spy top --pid $APP_PID
# OR record flame graph 30 seconds
py-spy record -o profile.svg --pid $APP_PID -d 30
# Open profile.svg в browserОжидаемый flame graph — hot_func доминирует (>95% time); cold_func thin slice (sleep). main — root frame.
Key insight: profiling без code modification — production teams могут profile any running Python service по PID. Это и есть преимущество sampling профилирования над cProfile (требует wrap кода).
Version pin py-spy>=0.3 (Pitfall 32 — Phase 69 baseline).
Cross-course → Spark 12/04 cost-optimization — production cost optimization parallel: profiling Spark workers через JFR / async-profiler без modifying job code — same pattern «attach + sample, no instrumentation».
M12 finish — bridge к M13
M12 закрыл: stdlib performance toolchain — измерение (timeit / perf_counter), профилирование (cProfile / pstats), память (tracemalloc / sys.getsizeof / gc), bytecode (dis), concurrency (asyncio + GIL + threading + multiprocessing). Cross-course мостики — Spark / DataFusion / ClickHouse architectural patterns.
M13 (next plan) — Packaging & Environment: pyproject.toml + pip / uv / rye / poetry; requirements.txt + lockfiles; PEP 440 specifiers; PEP 735 dependency-groups. Performance lessons из M12 carry — pip install cost matters, uv ~10× faster than pip via Rust → cross-link к dependency management performance.
Pragmatic-DEEP принцип: stdlib даёт достаточно для 80% production performance work. 3rd-party tools (py-spy / memray) — для последних 20% — production-grade scenarios. M12 преподаёт основу; toolchain — advanced extensions.
Cite py-spy GitHub + line_profiler + memray docs + austin GitHub.