GIL + threading vs multiprocessing

В прошлом уроке (asyncio) мы видели single-threaded concurrency — event loop schedules thousands of coroutines в одном thread. В этом уроке другие два подхода: threading (multiple OS threads, shared memory) и multiprocessing (multiple processes, isolated memory). Ключевая constraint — GIL (Global Interpreter Lock) — не позволяет threading parallelize CPU-bound work. Multiprocessing — обходной путь.

В этом уроке:

GIL semantics — bytecode-level switching.
Threading для I/O-bound — GIL released на I/O syscalls.
Multiprocessing для CPU-bound — separate processes + fork/spawn.
Pitfall 42 — GIL release rules nuanced — I/O + C-extensions YES; pure-Python CPU NO.
concurrent.futures — ThreadPoolExecutor vs ProcessPoolExecutor.
PEP 703 / PEP 779 free-threaded — supported в Python 3.14 (опт-in build).
Cross-course → Spark / DataFusion — distributed avoiding GIL.

Conceptual MDX prose only: примеры с import threading / import multiprocessing рендерятся как syntax-highlighted text; НЕ runnable в Pyodide browser (Phase 65 carrying — both modules forbidden в challenge code per Wave 0 lints; doesn’t apply к prose).

GIL semantics — bytecode-level switching

GIL (Global Interpreter Lock) — мьютекс, защищающий CPython interpreter state. Только один Python thread может execute Python bytecode at any time. GIL прозрачен для programmer — но имеет observable consequences:

Threading НЕ ускоряет pure-Python CPU loop (только один thread runs Python code at a time).
GIL released периодически — interpreter switches threads every N bytecodes (default ~100, configurable via sys.setswitchinterval).
GIL released on I/O syscalls — socket.recv, file.read, time.sleep — пока operation blocked, другие threads могут run.
GIL released на некоторых C-extensions — операции release GIL via Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS macros (e.g., NumPy ndarray ops; но numpy FORBIDDEN per Phase 65 — концептуальное упоминание).

Switch interval (Python 3.2+):

import sys
sys.getswitchinterval()                       # 0.005 (5ms default)
sys.setswitchinterval(0.001)                   # 1ms — more frequent switches

Это означает: пока одна Python thread runs CPU-bound loop, interpreter каждые 5ms даёт другим threads chance run (но если они тоже CPU-bound — GIL serialization → no real parallelism).

Cite docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock + Python GIL FAQ.

Threading для I/O-bound

threading.Thread — OS-level threads. Когда thread blocked on I/O (socket, file, sleep), GIL released → другие threads run. Threading wins для concurrent I/O:

import threading
import urllib.request

# Conceptual — НЕ runnable в Pyodide (browser security model)
def fetch(url):
    return urllib.request.urlopen(url).read()

threads = [threading.Thread(target=fetch, args=(url,)) for url in urls]
for t in threads:
    t.start()
for t in threads:
    t.join()

10 threads делают 10 concurrent fetches; пока каждый ждёт network response, GIL released. Total time ≈ slowest_fetch (vs 10× if sequential).

queue.Queue — thread-safe FIFO для coordinating workers:

import threading
import queue

def worker(q):
    while item := q.get():
        process(item)
        q.task_done()

q = queue.Queue()
threads = [threading.Thread(target=worker, args=(q,)) for _ in range(4)]

Limit: ~100-1000 OS threads практичны; больше — high memory overhead (~1MB per thread); для 10000+ concurrent ops — async wins.

Multiprocessing для CPU-bound

multiprocessing.Process — separate OS process, own Python interpreter, own GIL. Real parallelism для CPU work. Cost: process spawn overhead + IPC (inter-process communication).

import multiprocessing

# Conceptual — НЕ runnable в Pyodide (no fork/spawn в WASM)
def crunch(n):
    return sum(i*i for i in range(n))

if __name__ == '__main__':
    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(crunch, [10**6, 10**6, 10**6, 10**6])

4 processes parallel run crunch — each имеет own GIL, no serialization. Time ≈ slowest_crunch / 4 (для CPU-bound — real 4x speedup).

Process spawn methods (multiprocessing.set_start_method):

Method	OS support	Cost	Caveat
`fork`	Linux/macOS	~10ms	Inherits all parent state — может deadlock on locks
`spawn`	Cross-platform (default Windows + macOS 3.8+)	~50-100ms	Fresh interpreter — must pickle all args
`forkserver`	Linux	~30ms	Isolated server forks workers

Production rule — для long-running services prefer spawn (deterministic, cross-platform); fork для quick scripts on Linux.

Pitfall 42 — GIL release rules nuanced

GIL released for:

I/O syscalls — socket.recv, file.read, time.sleep, select.select, os.read — anything that blocks on kernel.
Some C-extensions — numpy.dot (numpy FORBIDDEN per Phase 65 — concept only), Cython with nogil: blocks, ctypes calls в general.
gc.collect() waits — может release GIL during long collection.
Pickle / json operations — partly (mostly held).

GIL NOT released for:

Pure Python CPU loops — for i in range(10**8): x += i*i — no I/O, no C-extension → GIL serialization.
Most pure-Python operations — list/dict/string ops — work под GIL.

Pitfall 42: programmer думает «threading parallelizemy_compute()» — wrong если my_compute() — pure Python CPU. Правильное решение: multiprocessing или native code (Cython, C-extension).

Empirical demonstration (нужен host Python — НЕ Pyodide):

import threading
import time

def cpu_work():
    sum(i*i for i in range(10_000_000))

# Sequential
t0 = time.perf_counter()
cpu_work()
cpu_work()
print(f'Sequential: {time.perf_counter()-t0:.2f}s')

# Threading — should be parallel? NOT for CPU-bound
t0 = time.perf_counter()
t1 = threading.Thread(target=cpu_work)
t2 = threading.Thread(target=cpu_work)
t1.start(); t2.start()
t1.join(); t2.join()
print(f'Threaded:   {time.perf_counter()-t0:.2f}s')
# Output: Threaded ≈ 1.05× Sequential (no real parallelism)

Threaded vs sequential — не быстрее. Reason: GIL serialization для pure-Python CPU.

`concurrent.futures` — `ThreadPoolExecutor` vs `ProcessPoolExecutor`

concurrent.futures (Python 3.2+, PEP 3148) — высокоуровневый API над threading + multiprocessing с same interface:

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# I/O-bound — ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=10) as ex:
    results = list(ex.map(fetch_url, urls))

# CPU-bound — ProcessPoolExecutor (same API!)
with ProcessPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(crunch, inputs))

Same submit() / map() / as_completed() API — switch executor класс по типу workload. Production rule:

Workload type	Executor	Workers count rule
I/O-bound	`ThreadPoolExecutor`	`min(32, cpu_count + 4)` (default) или больше для high-latency I/O
CPU-bound	`ProcessPoolExecutor`	`cpu_count()` (Не больше — context switching cost)
Mixed	Часто `ProcessPoolExecutor` + child process использует threading inside	Variant

as_completed(futures) — yields в порядке completion (не submission) — useful для streaming results.

Cite docs.python.org/3/library/concurrent.futures.html.

PEP 703 / PEP 779 — free-threaded Python (supported в 3.14)

PEP 703 — Making the Global Interpreter Lock Optional in CPython (accepted 2023) ввёл opt-in --disable-gil build flag. PEP 779 — Criteria for supported status for free-threaded Python перевёл free-threaded build из experimental (3.13) в officially supported (Python 3.14, released Oct 7, 2025). Это Phase II из PEP 703.

Что изменилось в 3.14:

No longer experimental — free-threaded build (python3.14t) — supported, но всё ещё opt-in (не default; standard python3.14 сохраняет GIL).
Single-thread overhead снизился с ~40% (3.13) до 5–10% (3.14) — specializing adaptive interpreter теперь enabled и в free-threaded mode; C API изменения завершены.
No GIL = real CPU parallelism для threading в free-threaded build.
Migration path — opt-in build, gradual ecosystem adoption продолжается; major libraries (NumPy, PyTorch) уже adapted.

Status (April 2026): free-threaded build production-ready для use cases где single-thread regression 5–10% acceptable. Ecosystem coverage растёт; FastAPI, polars, и другие major DE-libraries поддерживают.

Pragmatic forward-look: free-threaded Python сейчас (3.14) — legitimate CPU parallelization tool, не «через 2-3 года». Multiprocessing won’t disappear (still useful для process isolation, fault containment, true memory separation), но для shared-memory CPU parallelism free-threaded threading становится viable первым выбором — особенно для DE-workers внутри одного host.

Cite Python 3.14 What’s New — free-threading + PEP 779 + Python free-threading guide.

PEP 734 — sub-interpreters в stdlib (Python 3.14)

PEP 734 — Multiple Interpreters in the Stdlib accepted и реализован в Python 3.14 (October 2025) — модуль interpreters (high-level API над существующими since-1996 sub-interpreters C-API). Это третья parallelism strategy в Python: помимо threading (single GIL — no real CPU parallelism) и multiprocessing (separate OS processes — fork/spawn cost + pickle), теперь per-interpreter GIL — отдельный GIL на каждый sub-interpreter, всё в одном OS process.

Ключевые свойства sub-interpreters:

Per-interpreter GIL (PEP 684 — реализован в 3.12 как opt-in C-API) — каждый sub-interpreter имеет own interpreter state + own GIL. Real CPU parallelism между sub-interpreters в одном process.
Isolation guarantees — каждый sub-interpreter имеет own sys.modules, own globals, own GC state. Mutate в одном sub-interpreter не visible в другом.
Same OS process — единое адресное пространство, shared system resources (file descriptors, sockets), но изолированный Python state.
No pickle для shared data — interpreters.Queue поддерживает sharing основных immutable types (int, str, bytes, tuple-of-shareable) без сериализации через C-level binary buffer protocol. Mutable shared memory через memoryview поверх buffer (similar к multiprocessing.shared_memory но без separate process overhead).

# Conceptual — Python 3.14+ runtime (НЕ Pyodide); module 'interpreters' from PEP 734
from concurrent import interpreters

# Создание sub-interpreter
interp = interpreters.create()

# Execute код в isolated interpreter (own GIL, own globals)
result = interp.exec('x = sum(i*i for i in range(10_000_000))')
# main interpreter одновременно может run свой CPU loop — real parallelism

# Cleanup
interp.close()

interpreters.Queue — communication между interpreters:

from concurrent import interpreters

q = interpreters.create_queue()
interp = interpreters.create()

# Передача data через Queue — shareable types (int/str/bytes/tuple) без pickle
q.put((1, 2, 3))
interp.exec("""
from concurrent import interpreters
q = interpreters.lookup_queue(0)   # by id
data = q.get()
""")

Comparison: sub-interpreters vs multiprocessing:

Critère	`multiprocessing`	`interpreters` (PEP 734)
OS process	Separate (fork / spawn)	Same process
Startup cost	50–100 ms (spawn) / 10 ms (fork)	~1–5 ms (lightweight)
Memory overhead	Full Python interpreter copy (~10–30 MB)	Shared C runtime, isolated Python state (~3–8 MB)
Shared data	Pickle для всего	No pickle — shareable types через buffer protocol
Shared memory	`Array` / `shared_memory` (explicit)	`memoryview` поверх C buffer + `interpreters.Queue`
Fault isolation	Хорошая — crash в child не валит parent	Слабее — segfault в C-extension валит весь process
GIL	Own GIL per process	Own GIL per sub-interpreter (PEP 684)
Cross-platform	Yes (fork только Linux/macOS; spawn — везде)	Yes (single C-runtime path)

Когда выбирать sub-interpreters над multiprocessing:

Нужен real CPU parallelism + lightweight startup (e.g., dynamically spawning workers under load).
Передача больших immutable data structures (tuple of bytes) — pickle overhead доминирует в multiprocessing.
Embedded scenarios — single OS process требование (e.g., shared file descriptors, container resource limits).

Когда multiprocessing всё ещё лучше:

Fault isolation — crash в worker не должен валить master (multiprocessing — separate processes — natural isolation).
C-extensions без sub-interpreter support — many libraries (numpy <2.0, pandas, lxml) не isolation-safe (global C state). Use multiprocessing для compatibility.
Cross-machine workloads — sub-interpreters single-host only; multiprocessing extends к distributed (Dask, Ray) с similar API.

Pitfall: C-extensions support uneven. Sub-interpreter requires module to be isolated-safe — declare Py_MOD_PER_INTERPRETER_GIL_SUPPORTED slot. Many older C-extensions держат module-level state в C globals — crash на init второго sub-interpreter. Status April 2026: numpy 2.x supports, pyarrow в transition, pure-Python libraries — works automatically. Cite PEP 684 Section “Module Slots” + Python 3.14 What’s New — interpreters.

Production status (April 2026): PEP 734 stdlib interpreters module — supported в Python 3.14 (released Oct 7, 2025). Free-threaded build (PEP 779, см. выше) — alternative path к real CPU parallelism через threads (no GIL вообще). Two strategies coexist:

Free-threaded build (python3.14t) — no GIL, threads share state freely, single-thread overhead 5–10%.
interpreters module (default python3.14) — GIL retained per interpreter, isolation strong, single-thread overhead 0%.

Ecosystem evolves; для production CPU-parallel workloads consider both paths и choose по compatibility / safety priorities.

Cite PEP 734 — Multiple Interpreters in the Stdlib + PEP 684 — Per-Interpreter GIL + Python 3.14 What’s New.

Cross-course → Spark / DataFusion — distributed avoiding GIL

Cross-course → Spark: 01/02 driver-executor-model — Spark распределяет CPU work через separate JVM workers (executors) precisely потому что Python (PySpark driver) не может threading-parallelize CPU work — GIL constraint. Worker JVMs run в parallel, communicate с Python driver через RPC. Same architectural pattern: «GIL? Fine, мы не Python threading — мы separate processes/workers».

Cross-course → DataFusion: 02/06 crate-architecture — DataFusion (Rust) использует Tokio thread pool для query execution — Rust no GIL → real thread parallelism (pre-PEP 703 архитектура без single-thread limit). DataFusion может saturate всех CPU cores с single process — что Python требует multiprocessing для achieve.

Decision matrix synthesis:

Workload	Single-process Python	Distributed
I/O-bound, ≤100 concurrent	threading	—
I/O-bound, 100-1000+	asyncio	—
CPU-bound, single machine	multiprocessing	—
CPU-bound, big data	—	Spark / DataFusion

GIL constraint объясняет почему Spark / DataFusion architectures decentralize: они bypass Python single-process GIL limitation through separate workers / Rust no-GIL.

Что в следующем уроке

Урок 07 — toolchain summary + py-spy / line_profiler / memray / austin brief mention. M12 recap + bridge к M13 (packaging) + forward-link Phase 70.

Pragmatic-DEEP принцип: GIL — single most-misunderstood Python feature. Знать когда GIL released (I/O / C-extensions) vs когда нет (pure Python CPU) — production-critical decision skill. PEP 703 / PEP 779 — free-threaded build supported в 3.14, single-thread overhead 5–10%; meaningful уже сейчас, не в отдалённом будущем.

Cite Python GIL FAQ + PEP 703 — no-GIL + PEP 779 — free-threaded supported status + PEP 3148 — concurrent.futures + docs.python.org/3/library/concurrent.futures.html.

GIL + threading vs multiprocessing

GIL semantics — bytecode-level switching

Threading для I/O-bound

Multiprocessing для CPU-bound

Pitfall 42 — GIL release rules nuanced

concurrent.futures — ThreadPoolExecutor vs ProcessPoolExecutor

PEP 703 / PEP 779 — free-threaded Python (supported в 3.14)

PEP 734 — sub-interpreters в stdlib (Python 3.14)

Cross-course → Spark / DataFusion — distributed avoiding GIL

Что в следующем уроке

Закончили урок?

`concurrent.futures` — `ThreadPoolExecutor` vs `ProcessPoolExecutor`