dis — bytecode introspection + generators-vs-list-comp
В М05 урок 02 (PyGenObject) мы видели, что generator function — не «функция со специальным yield», а отдельная class objects (PyGenObject) с другим execution machinery. Bytecode делает эту разницу видимой: [x*2 for x in range(3)] (list comp) vs (x*2 for x in range(3)) (generator) компилируются в разный bytecode — разные opcodes. Stdlib dis показывает это.
В этом уроке:
dis.dis/dis.get_instructions/dis.Bytecode— three API entry points.- Common opcodes overview — LOAD_FAST, LOAD_CONST, BINARY_OP, RESUME, CALL, RETURN_VALUE.
- Generators vs list comp at bytecode level — LIST_APPEND vs YIELD_VALUE.
- Pragmatic NOT D-07 — observable through public dis API; не deep-diving в PyCodeObject internals.
- Pitfall 40 — PEP 659 — opcode names версия-чувствительны (specialized adaptive interpreter).
- Code-challenge
py-m12-04-code-1— Pattern 4 opcode counting черезCounter. - Run-on-Your-Machine #4 — real
dis.disoutput для 3 patterns.
dis.dis / dis.get_instructions / dis.Bytecode
Три entry points с разной целью:
| API | Returns | Use case |
|---|---|---|
dis.dis(x) | None — prints to stdout | Quick CLI inspection |
dis.get_instructions(x) | Iterator of Instruction namedtuples | Programmatic analysis (что делает Pattern 4) |
dis.Bytecode(x) | Bytecode object с .info(), .dis() methods | Mid-level — programmatic + pretty print |
x может быть:
- Code object —
compile('expr', '<test>', 'eval')returns code object. - Function —
dis.dis(my_func)disassemblesmy_func.__code__. - Source string —
dis.dis('x + 1')compiles + disassembles.
Recipe:
import dis
# Inspect a function
def square(x):
return x * x
dis.dis(square)
# RESUME 0
# LOAD_FAST 0 (x)
# LOAD_FAST 0 (x)
# BINARY_OP 5 (*)
# RETURN_VALUE
# Programmatic — count opcodes
from collections import Counter
instrs = dis.get_instructions(square)
counter = Counter(instr.opname for instr in instrs)
# Counter({'RESUME': 1, 'LOAD_FAST': 2, 'BINARY_OP': 1, 'RETURN_VALUE': 1})
Instruction namedtuple has .opname, .argval, .offset, .starts_line — usable для structural analysis.
Common opcodes overview
dis.Bytecode(...).info() показывает meta (filename, names, varnames). Сами opcodes — instructions для CPython interpreter loop (Python/ceval.c). Ключевые:
| Opcode | Meaning | When emitted |
|---|---|---|
RESUME | Function entry / generator resume marker (3.11+) | Каждая function/coroutine |
LOAD_FAST n | Load local variable by slot n | Reading function-local var |
LOAD_CONST k | Load constant from co_consts[k] | Literals (1, 'str', None) |
LOAD_GLOBAL | Load global / builtin | print, module-level vars |
STORE_FAST n | Store top-of-stack to local slot n | x = ... assignment |
BINARY_OP op | Arithmetic/comparison (+, -, *, <, etc.) | 3.11+ unified opcode (was BINARY_ADD/SUBTRACT/etc) |
CALL n | Call top-of-stack with n args | Function call (3.11+ unified) |
RETURN_VALUE | Pop top-of-stack, return | return expr |
LIST_APPEND | Append to list at offset N | List comprehension iteration |
YIELD_VALUE | Yield top-of-stack | Generator expression / yield statement |
GET_ITER | Get iterator from iterable | for ... in ...: start |
FOR_ITER | Iterate, jump if exhausted | for ... in ...: body |
Cross-course → Spark 02 catalyst-tungsten — Catalyst optimizer + Tungsten code-generation produces JVM bytecode для optimized query plan; same idea (compile high-level expression → low-level instructions). Spark Catalyst и Python dis оба — bridge от declarative DSL к executable VM.
Pragmatic NOT D-07: мы НЕ deep-diving в
PyCodeObjectstruct (M02-M03 territory carrying); мы observable through public dis API. Что bytecode actually делает — public; как stored в memory — D-07 / Phase 65 carrying.
Cite docs.python.org/3/library/dis.html.
Generators vs list comp at bytecode level
Самый pedagogical example — comparison list comprehension vs generator expression:
import dis
# LIST COMPREHENSION
dis.dis('[x*2 for x in range(3)]')
Output (упрощённо, на Python 3.13):
RESUME 0
LOAD_CONST 0 (<code object <listcomp> ...>)
MAKE_FUNCTION
LOAD_GLOBAL ... (range)
LOAD_CONST 1 (3)
CALL 1
GET_ITER
CALL 0
RETURN_VALUE
Disassembly of <code object <listcomp> ...>:
...
FOR_ITER ...
STORE_FAST 0 (x)
LOAD_FAST 0 (x)
LOAD_CONST 0 (2)
BINARY_OP 5 (*)
LIST_APPEND 2 <-- ВАЖНО: append to intermediate list
JUMP_BACKWARD ...
Ключевой opcode — LIST_APPEND: каждая iteration mutate’ит intermediate list. Memory: list grows incrementally + final list returned.
# GENERATOR EXPRESSION
dis.dis('(x*2 for x in range(3))')
RESUME 0
LOAD_CONST 0 (<code object <genexpr> ...>)
MAKE_FUNCTION
LOAD_GLOBAL ... (range)
LOAD_CONST 1 (3)
CALL 1
GET_ITER
CALL 0
RETURN_VALUE
Disassembly of <code object <genexpr> ...>:
...
FOR_ITER ...
STORE_FAST 0 (x)
LOAD_FAST 0 (x)
LOAD_CONST 0 (2)
BINARY_OP 5 (*)
YIELD_VALUE ... <-- ВАЖНО: yield, not append
RESUME 1
POP_TOP
JUMP_BACKWARD ...
Ключевой opcode — YIELD_VALUE: каждая iteration yields value caller’у, не аллоцируя intermediate list. Memory: O(1) — только current value.
Cross-link M05 урок 02 (PyGenObject): generator’s «state machine» —
PyGenObjectsaves frame state на каждомYIELD_VALUEчерезRESUMEopcode. Bytecode подтверждает что generator — не magic, а sequence of opcodes specific to coroutine machinery.
Production rule — для huge iterables (млн элементов) generator wins memory; для small + multi-pass list comprehension wins (no need to re-create iterator). Cross-link урок 03 — tracemalloc показывает memory difference; dis показывает why (LIST_APPEND vs YIELD_VALUE).
Cite docs.python.org/3/library/dis.html#opcode-LIST_APPEND + YIELD_VALUE.
Pitfall 40 — PEP 659 specialized adaptive interpreter
Python 3.11+ ввёл PEP 659 — specialized adaptive interpreter. Идея — opcodes могут переписать сами себя в runtime для type-specific paths:
# Generic opcode
BINARY_OP 5 (*) # multiply — works for any types
# After type-feedback collection (PEP 659 specialization):
BINARY_OP_MULTIPLY_INT # specialized — known both args int
BINARY_OP_MULTIPLY_FLOAT # specialized — known both floats
Это даёт ~10-30% speedup для tight loops, но имеет side effect для dis output:
- Между Python versions specific opcode names меняются (3.11 added BINARY_OP unified; 3.12 specialized variants; 3.13 further refined).
- Same source code может dis-disassemble в разные opcodes на разных versions.
- Specialized opcodes показываются в dis output только если specialization triggered (после warm-up runs).
Implication для testing: assertions «equals exact opcode list» — fragile. Решение: matchMode='contains' в testCases — assert presence of generic opcodes ('BINARY_OP', 'LIST_APPEND', 'YIELD_VALUE') — robust к version drift.
Pitfall 40: никогда не делайте
assert opnames == ['LOAD_FAST', 'BINARY_OP', ...]— это сломается при upgrade Python. Делайтеassert 'YIELD_VALUE' in opnames— substring/membership check survives version transitions.
В Pattern 4 challenge ниже все testCases используют matchMode='contains' — assert что specific opcode present; not что exact dict matches.
Code-challenge py-m12-04-code-1 — Pattern 4 opcode counting
Quiz JSON 04-dis-bytecode.json содержит challenge:
Дана строка-выражение Python (e.g.,
'1 + 2'). Compile её вeval-mode, disassemble черезdis.get_instructions, returnCounter-как-dict mapping opname → count.
Pattern 4 — compile + walk nested code objects + Counter. Solution skeleton (revealed после submission):
import dis
from collections import Counter
from types import CodeType
def solve(code_str: str) -> dict:
code_obj = compile(code_str, '<test>', 'eval')
counter = Counter()
stack = [code_obj]
while stack:
co = stack.pop()
for instr in dis.get_instructions(co):
counter[instr.opname] += 1
# Recurse в nested code objects (list comp / generator / lambda live в co.co_consts)
for const in co.co_consts:
if isinstance(const, CodeType):
stack.append(const)
return dict(counter)
Важная нюанс: list / generator comprehensions компилируются в nested code objects (хранятся в co.co_consts со специальным типом types.CodeType). Opcodes LIST_APPEND / YIELD_VALUE / FOR_ITER видны только если walking nested constants — иначе outer code object содержит лишь MAKE_FUNCTION + CALL (создание and invoking comp/genexpr).
3 testCases (using matchMode='contains' per Pitfall 40):
tc1—'sum([x*y for x, y in pairs])'→ assert'LIST_APPEND'in result.tc2—'[x*2 for x in range(3)]'→ assert'LIST_APPEND'in result.tc3(hidden) —'(x for x in range(3))'→ assert'YIELD_VALUE'in result.
Pitfall — constant folding:
'1 + 2'(literal-only expression) — constant-folded compiler’ом вRETURN_CONST 3; не emit BINARY_OP. Поэтому testCases используют variable references (x*y,x*2,x) — не constant literals. Это empirical observation 3.13+ optimization (PEP 659 + AST-level folding).
Pyodide-runnable: dis + compile + Counter + types.CodeType — все stdlib; результат — dict (deterministic + serializable). compile(s, '<test>', 'eval') — eval mode (single expression — для multi-statement используется 'exec').
Real dis.dis(...) printed output (visual) → Run-on-Your-Machine.
Run-on-Your-Machine #4 — real dis.dis output для 3 patterns
Run-on-Your-Machine: dis output для list comp / generator / function call
Установите (dis — stdlib):
python --version # >=3.11Создайте файл dis_demo.py:
import dis
print('=== List comprehension: [x*2 for x in range(3)] ===')
dis.dis('[x*2 for x in range(3)]')
print('\n=== Generator expression: (x*2 for x in range(3)) ===')
dis.dis('(x*2 for x in range(3))')
print('\n=== Simple function: f(1, 2) ===')
def f(a, b):
return a + b
dis.dis(f)Запустите:
python dis_demo.pyОжидаемый вывод (фрагмент — точные offsets зависят от Python build):
=== List comprehension: [x*2 for x in range(3)] ===
0 RESUME 0
1 LOAD_CONST 0 (<code object <listcomp> ...>)
...
LIST_APPEND 2 <-- key opcode
JUMP_BACKWARD ...
RETURN_VALUE
=== Generator expression: (x*2 for x in range(3)) ===
0 RESUME 0
1 LOAD_CONST 0 (<code object <genexpr> ...>)
...
YIELD_VALUE ... <-- key opcode (not LIST_APPEND!)
RESUME 1
POP_TOP
JUMP_BACKWARD ...
=== Simple function: f(1, 2) ===
RESUME 0
LOAD_FAST 0 (a)
LOAD_FAST 1 (b)
BINARY_OP 0 (+)
RETURN_VALUEВ browser challenge мы НЕ показываем visual output (Pitfall 38 — Pyodide may interleave); challenge counts opcodes через Counter + matchMode='contains' (Pitfall 40 — opcode names version-sensitive PEP 659).
Version pin Python>=3.11 (Pitfall 32 — PEP 659 specialized adaptive interpreter добавил specialized opcodes; some opcode names появились только в 3.11+ — BINARY_OP was previously BINARY_ADD/BINARY_SUBTRACT/etc.).
Что в следующем уроке
Урок 05 — asyncio event loop overview (CONCEPTUAL). Coroutines as generators (cross-link M05 урок 02 — await ≈ yield from advanced); when async helps (I/O-bound) vs hurts (CPU-bound); Pyodide async caveats (Pitfall 41).
Pragmatic-DEEP принцип:
disдает mechanical view (что компилируется);asyncioдает architectural view (почему async — single-threaded scheduling vs threads). Together они покрывают bytecode + control-flow.
Cite docs.python.org/3/library/dis.html + PEP 659 — specialized adaptive interpreter.