`cProfile` + `pstats` — function-level profiling

В прошлом уроке мы научились мерить отдельный кусок кода через timeit. Но в production typical вопрос другой: «Программа занимает 30 секунд — где она тратит время?». Здесь нужен profiler — он instruments каждый function call и аггрегирует timing. Stdlib предоставляет cProfile (deterministic profiler, написан в C для low overhead) + pstats (для анализа собранных данных).

В этом уроке:

cProfile.run vs cProfile.Profile() — top-level API + context-manager (cross-link M06 урок 04).
pstats.Stats API — sort_stats, print_stats, output columns.
Deterministic vs sampling — cProfile vs py-spy / austin trade-offs.
Observer effect — cProfile overhead сам по себе.
Pyodide caveat (Pitfall 38) — почему challenges parse pre-captured stats.
Code-challenge py-m12-02-code-1 — Pattern 2 regex parsing pstats text output.
Run-on-Your-Machine #2 — host cProfile real run.

`cProfile.run` vs `cProfile.Profile()` context-manager

Самый простой entry-point — cProfile.run(statement, filename=None, sort=-1):

import cProfile

def main():
    sum(range(1_000_000))
    [x*2 for x in range(100_000)]

cProfile.run('main()', '/tmp/stats.prof')

cProfile.run принимает строку statement (eval’ится в caller’s globals) и optional filename для сохранения результата (binary marshal format). Удобно для quick CLI usage, но плохо масштабируется.

Production pattern — cProfile.Profile() context-manager (cross-link M06 урок 04 — __enter__ / __exit__ protocol):

import cProfile

with cProfile.Profile() as profile:
    main()                      # любой code — не string-eval
    do_other_work()

profile.dump_stats('/tmp/stats.prof')

Преимущества:

Не string-eval — типобезопасно, IDE-friendly, works со closures / locals.
Composable — можно открыть Profile вручную: p = cProfile.Profile(); p.enable(); ...; p.disable().
Selective regions — wrap только hot path, не всю программу.
Контекст-менеджер автоматически enable() / disable() через __enter__ / __exit__ (M06 урок 04 protocol — Profile реализует tri-arg __exit__).

Cross-link M06 урок 04: context-manager protocol — это именно __enter__ устанавливает invariant (profiler enabled) + __exit__ гарантирует cleanup (disabled даже при exception). Тот же pattern что with open(...), with lock, with tracemalloc.taking_snapshot() (урок 03).

Cite docs.python.org/3/library/profile.html.

`pstats.Stats` — `sort_stats`, `print_stats`, output columns

Сохранённый .prof файл анализируется через pstats.Stats:

import pstats

stats = pstats.Stats('/tmp/stats.prof')
stats.strip_dirs()                          # remove leading paths from filenames
stats.sort_stats('cumulative')               # sort options below
stats.print_stats(20)                        # top-20 functions

sort_stats(key) — что использовать как primary sort:

Key	Meaning	When to use
`'cumulative'`	Cumulative time (включая subcalls)	Default for top-down analysis — find slow chains
`'tottime'`	Total time внутри функции (без subcalls)	Find self-time hotspots — leaf bottlenecks
`'ncalls'`	Call count	Find functions called too often
`'name'`	Sort by function name	Alphabetic — debugging
`'pcalls'`	Primitive call count (без recursion)	Distinguish от ncalls для recursive code

Output columns (4 первых time-related):

   ncalls   tottime   percall   cumtime   percall   filename:lineno(function)
        1     0.040     0.040     0.045     0.045   slow_demo.py:8(main)

ncalls — total call count (если recursion: 5/3 = 5 total / 3 primitive).
tottime — total time внутри функции (без subcalls); это self-time.
percall (после tottime) — tottime / ncalls.
cumtime — cumulative time (включая subcalls); это «полное» время функции.
percall (после cumtime) — cumtime / primitive ncalls.

Production rule: start с sort_stats('cumulative').print_stats(20) — top-down view (что в целом долго); затем переключаться на sort_stats('tottime').print_stats(20) — bottom-up view (где self-time hotspot).

Deterministic vs sampling profiling

cProfile — deterministic: instruments every function call/return через CPython hook (sys.setprofile). Каждый call увеличивает counter + записывает timestamp. Точность — perfect; overhead — significant (~30% slowdown).

Sampling profilers — periodically dump call stack (e.g. 100 раз в секунду). Examples: py-spy, austin. Точность — statistical (увидите hot functions, но точные ncalls могут быть приблизительными); overhead — низкий (<2%) — suitable для production.

Approach	Overhead	Precision	Production-safe
`cProfile` (deterministic)	~30% slowdown	Exact ncalls + tottime + cumtime	NO — too slow для prod
`py-spy` (sampling)	<2%	Statistical (proportional accuracy)	YES — eBPF-like, attaches к running process via /proc Linux
`austin` (sampling)	<5%	Frame-by-frame	YES
`line_profiler` (deterministic, line-level)	~50% slowdown	Per-line timing	Dev only

Decision rule: dev / pre-prod analysis → cProfile (точные numbers). Production diagnostics → py-spy (low overhead, attaches без code change). Cross-course → Spark UI deep dive показывает аналогичный split: deterministic JVM profiler (off) vs sampling JFR / async-profiler (production).

Cross-course → Spark 04/01 shuffle-optimization — distributed equivalent of cProfile call graph; Spark UI показывает per-stage timing, аналогично pstats sort_stats('cumulative').

Cross-course → Spark 01/04 dag-scheduler — DAG visualization для distributed jobs — параллельная idea к cProfile call graph для single-process.

Python UDF overhead — где cProfile-логика встречает Spark DataFusion EXPLAIN — query plan как call graph

Observer effect — cProfile overhead сам по себе

cProfile overhead distorts results — особенно для low-overhead functions (1µs functions могут показаться 5µs из-за hook cost). Это observer effect:

Tight loops — for i in range(N): tiny_op(i) cProfile overhead > tiny_op cost; result misleading.
C-extension calls — numpy.dot (если был bы доступен — Phase 65 forbid) tracked одной call записью; overhead minimal.
Recursive functions — overhead amplified per call frame.

Production rule: для tight loops — timeit (минимальный overhead); для function-level — cProfile (overhead приемлем для finding bottlenecks); для production — sampling.

Pyodide caveat (Pitfall 38) — challenges парсят pre-captured stats

В Pyodide WASM (browser) cProfile работает (Python implementation в lib/pstats.py + lib/profile.py portable), НО timing — unreliable (предыдущий урок Pitfall 38). Browser event-loop interleaving делает per-call timing noisy.

Решение для Phase 69: challenges НЕ запускают cProfile.run('expensive_func()') runtime — мы парсим pre-captured pstats text output (data, snapshot’ed на host). Это:

Детерминированно — text input → text output, не зависит от timer noise.
Pure stdlib — re модуль для regex parsing; никакой cProfile runtime invocation.
Pedagogically valid — учит читать pstats output (real production skill — DevOps часто видят stats text не bracketed dashboards).
Pattern 2 challenge ниже именно это и делает.

Real cProfile run → Run-on-Your-Machine.

Code-challenge `py-m12-02-code-1` — Pattern 2 regex pstats parsing

Quiz JSON 02-cprofile-pstats.json содержит challenge:

Дан pre-captured pstats text output (несколько data lines). Распарсите через regex, отсортируйте по cumtime descending, верните top-3 как [(funcname, cumtime), ...].

Pattern 2 — regex line parsing → tuple aggregate → sort + slice. Solution skeleton (revealed после submission):

import re

def solve(stats_text: str) -> list[tuple[str, float]]:
    pattern = re.compile(
        r'^\s*\d+\s+[\d.]+\s+[\d.]+\s+([\d.]+)\s+[\d.]+\s+\S+:\d+\((\w+)\)',
        re.MULTILINE,
    )
    matches = pattern.findall(stats_text)
    parsed = [(funcname, float(cumtime)) for cumtime, funcname in matches]
    parsed.sort(key=lambda t: t[1], reverse=True)
    return parsed[:3]

Что регэкс ловит (per pstats print_stats output):

^\s*\d+ — ncalls (integer)
\s+[\d.]+\s+[\d.]+ — tottime, percall
\s+([\d.]+) — cumtime (capture group 1)
\s+[\d.]+ — second percall
\s+\S+:\d+\((\w+)\) — filename:lineno(funcname) — funcname capture group 2

Pyodide-runnable: re — stdlib, solve(stats_text) принимает уже-готовый text input → детерминированный output. НЕ instantiate’им cProfile runtime в browser (Pitfall 38 — timing noisy).

3 testCases (2 visible + 1 hidden):

tc1 — 3 функции slow_io/parse/setup → [('slow_io', 0.5), ('parse', 0.3), ('setup', 0.1)].
tc2 — 5 функций → top-3 by cumtime.
tc3 (hidden) — 1 функция → [('only_func', 0.42)].

Cross-course → Spark UI executor metrics — same idea (parse Spark UI Stage Details text dump → sort → top-N) — параллель к Spark 04/01 shuffle-optimization analytics.

Run-on-Your-Machine #2 — host cProfile real run

TIP

Run-on-Your-Machine: cProfile real run на slow_demo.py

Установите (cProfile уже stdlib):

python --version  # >=3.11

Создайте файл slow_demo.py:

import cProfile
import pstats

def slow_io():
    return sum(range(1_000_000))

def parse():
    return [x * 2 for x in range(100_000)]

def setup():
    return {i: i for i in range(10_000)}

def main():
    slow_io()
    parse()
    setup()

with cProfile.Profile() as profile:
    main()

stats = pstats.Stats(profile)
stats.strip_dirs().sort_stats('cumulative').print_stats(10)

Запустите:

python slow_demo.py

Ожидаемый вывод (точные числа зависят от CPU):

         4 function calls in 0.045 seconds

   Ordered by: cumulative time
   List reduced from 8 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.045    0.045 slow_demo.py:13(main)
        1    0.030    0.030    0.030    0.030 slow_demo.py:4(slow_io)
        1    0.010    0.010    0.010    0.010 slow_demo.py:7(parse)
        1    0.005    0.005    0.005    0.005 slow_demo.py:10(setup)

Эту самую text-форму challenge py-m12-02-code-1 парсит через regex.

В browser challenge мы парсим pre-captured text — потому что Pyodide WASM event-loop interleaving делает live timing data unreliable (Pitfall 38). Real cProfile run только на host. Version pin Python>=3.11 (Pitfall 32 — Phase 69 baseline).

Что в следующем уроке

Урок 03 — tracemalloc — memory profiling, take_snapshot() / compare_to() для leak detection. Cross-link M02 урок 02 sys.getsizeof baseline.

Pragmatic-DEEP принцип: cProfile отвечает «время, где». tracemalloc отвечает «память, где». Together — full performance picture.

Cite docs.python.org/3/library/profile.html + docs.python.org/3/library/pstats.html.

cProfile + pstats — function-level profiling

cProfile.run vs cProfile.Profile() context-manager

pstats.Stats — sort_stats, print_stats, output columns

Deterministic vs sampling profiling

Observer effect — cProfile overhead сам по себе

Pyodide caveat (Pitfall 38) — challenges парсят pre-captured stats

Code-challenge py-m12-02-code-1 — Pattern 2 regex pstats parsing

Run-on-Your-Machine #2 — host cProfile real run

Что в следующем уроке

Закончили урок?

`cProfile` + `pstats` — function-level profiling

`cProfile.run` vs `cProfile.Profile()` context-manager

`pstats.Stats` — `sort_stats`, `print_stats`, output columns

Code-challenge `py-m12-02-code-1` — Pattern 2 regex pstats parsing