GIL + threading vs multiprocessing
В прошлом уроке (asyncio) мы видели single-threaded concurrency — event loop schedules thousands of coroutines в одном thread. В этом уроке другие два подхода: threading (multiple OS threads, shared memory) и multiprocessing (multiple processes, isolated memory). Ключевая constraint — GIL (Global Interpreter Lock) — не позволяет threading parallelize CPU-bound work. Multiprocessing — обходной путь.
В этом уроке:
- GIL semantics — bytecode-level switching.
- Threading для I/O-bound — GIL released на I/O syscalls.
- Multiprocessing для CPU-bound — separate processes + fork/spawn.
- Pitfall 42 — GIL release rules nuanced — I/O + C-extensions YES; pure-Python CPU NO.
concurrent.futures—ThreadPoolExecutorvsProcessPoolExecutor.- PEP 703 / PEP 779 free-threaded — supported в Python 3.14 (опт-in build).
- Cross-course → Spark / DataFusion — distributed avoiding GIL.
Conceptual MDX prose only: примеры с
import threading/import multiprocessingрендерятся как syntax-highlighted text; НЕ runnable в Pyodide browser (Phase 65 carrying — both modules forbidden в challenge code per Wave 0 lints; doesn’t apply к prose).
GIL semantics — bytecode-level switching
GIL (Global Interpreter Lock) — мьютекс, защищающий CPython interpreter state. Только один Python thread может execute Python bytecode at any time. GIL прозрачен для programmer — но имеет observable consequences:
- Threading НЕ ускоряет pure-Python CPU loop (только один thread runs Python code at a time).
- GIL released периодически — interpreter switches threads every N bytecodes (default ~100, configurable via
sys.setswitchinterval). - GIL released on I/O syscalls —
socket.recv,file.read,time.sleep— пока operation blocked, другие threads могут run. - GIL released на некоторых C-extensions — операции release GIL via
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADSmacros (e.g., NumPy ndarray ops; но numpy FORBIDDEN per Phase 65 — концептуальное упоминание).
Switch interval (Python 3.2+):
import sys
sys.getswitchinterval() # 0.005 (5ms default)
sys.setswitchinterval(0.001) # 1ms — more frequent switches
Это означает: пока одна Python thread runs CPU-bound loop, interpreter каждые 5ms даёт другим threads chance run (но если они тоже CPU-bound — GIL serialization → no real parallelism).
Cite docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock + Python GIL FAQ.
Threading для I/O-bound
threading.Thread — OS-level threads. Когда thread blocked on I/O (socket, file, sleep), GIL released → другие threads run. Threading wins для concurrent I/O:
import threading
import urllib.request
# Conceptual — НЕ runnable в Pyodide (browser security model)
def fetch(url):
return urllib.request.urlopen(url).read()
threads = [threading.Thread(target=fetch, args=(url,)) for url in urls]
for t in threads:
t.start()
for t in threads:
t.join()
10 threads делают 10 concurrent fetches; пока каждый ждёт network response, GIL released. Total time ≈ slowest_fetch (vs 10× if sequential).
queue.Queue — thread-safe FIFO для coordinating workers:
import threading
import queue
def worker(q):
while item := q.get():
process(item)
q.task_done()
q = queue.Queue()
threads = [threading.Thread(target=worker, args=(q,)) for _ in range(4)]
Limit: ~100-1000 OS threads практичны; больше — high memory overhead (~1MB per thread); для 10000+ concurrent ops — async wins.
Multiprocessing для CPU-bound
multiprocessing.Process — separate OS process, own Python interpreter, own GIL. Real parallelism для CPU work. Cost: process spawn overhead + IPC (inter-process communication).
import multiprocessing
# Conceptual — НЕ runnable в Pyodide (no fork/spawn в WASM)
def crunch(n):
return sum(i*i for i in range(n))
if __name__ == '__main__':
with multiprocessing.Pool(processes=4) as pool:
results = pool.map(crunch, [10**6, 10**6, 10**6, 10**6])
4 processes parallel run crunch — each имеет own GIL, no serialization. Time ≈ slowest_crunch / 4 (для CPU-bound — real 4x speedup).
Process spawn methods (multiprocessing.set_start_method):
| Method | OS support | Cost | Caveat |
|---|---|---|---|
fork | Linux/macOS | ~10ms | Inherits all parent state — может deadlock on locks |
spawn | Cross-platform (default Windows + macOS 3.8+) | ~50-100ms | Fresh interpreter — must pickle all args |
forkserver | Linux | ~30ms | Isolated server forks workers |
Production rule — для long-running services prefer spawn (deterministic, cross-platform); fork для quick scripts on Linux.
Pitfall 42 — GIL release rules nuanced
GIL released for:
- I/O syscalls —
socket.recv,file.read,time.sleep,select.select,os.read— anything that blocks on kernel. - Some C-extensions —
numpy.dot(numpy FORBIDDEN per Phase 65 — concept only), Cythonwith nogil:blocks, ctypes calls в general. gc.collect()waits — может release GIL during long collection.- Pickle / json operations — partly (mostly held).
GIL NOT released for:
- Pure Python CPU loops —
for i in range(10**8): x += i*i— no I/O, no C-extension → GIL serialization. - Most pure-Python operations — list/dict/string ops — work под GIL.
Pitfall 42: programmer думает «threading parallelize
my_compute()» — wrong еслиmy_compute()— pure Python CPU. Правильное решение: multiprocessing или native code (Cython, C-extension).
Empirical demonstration (нужен host Python — НЕ Pyodide):
import threading
import time
def cpu_work():
sum(i*i for i in range(10_000_000))
# Sequential
t0 = time.perf_counter()
cpu_work()
cpu_work()
print(f'Sequential: {time.perf_counter()-t0:.2f}s')
# Threading — should be parallel? NOT for CPU-bound
t0 = time.perf_counter()
t1 = threading.Thread(target=cpu_work)
t2 = threading.Thread(target=cpu_work)
t1.start(); t2.start()
t1.join(); t2.join()
print(f'Threaded: {time.perf_counter()-t0:.2f}s')
# Output: Threaded ≈ 1.05× Sequential (no real parallelism)
Threaded vs sequential — не быстрее. Reason: GIL serialization для pure-Python CPU.
concurrent.futures — ThreadPoolExecutor vs ProcessPoolExecutor
concurrent.futures (Python 3.2+, PEP 3148) — высокоуровневый API над threading + multiprocessing с same interface:
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
# I/O-bound — ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=10) as ex:
results = list(ex.map(fetch_url, urls))
# CPU-bound — ProcessPoolExecutor (same API!)
with ProcessPoolExecutor(max_workers=4) as ex:
results = list(ex.map(crunch, inputs))
Same submit() / map() / as_completed() API — switch executor класс по типу workload. Production rule:
| Workload type | Executor | Workers count rule |
|---|---|---|
| I/O-bound | ThreadPoolExecutor | min(32, cpu_count + 4) (default) или больше для high-latency I/O |
| CPU-bound | ProcessPoolExecutor | cpu_count() (Не больше — context switching cost) |
| Mixed | Часто ProcessPoolExecutor + child process использует threading inside | Variant |
as_completed(futures) — yields в порядке completion (не submission) — useful для streaming results.
Cite docs.python.org/3/library/concurrent.futures.html.
PEP 703 / PEP 779 — free-threaded Python (supported в 3.14)
PEP 703 — Making the Global Interpreter Lock Optional in CPython (accepted 2023) ввёл opt-in --disable-gil build flag. PEP 779 — Criteria for supported status for free-threaded Python перевёл free-threaded build из experimental (3.13) в officially supported (Python 3.14, released Oct 7, 2025). Это Phase II из PEP 703.
Что изменилось в 3.14:
- No longer experimental — free-threaded build (
python3.14t) — supported, но всё ещё opt-in (не default; standardpython3.14сохраняет GIL). - Single-thread overhead снизился с ~40% (3.13) до 5–10% (3.14) — specializing adaptive interpreter теперь enabled и в free-threaded mode; C API изменения завершены.
- No GIL = real CPU parallelism для threading в free-threaded build.
- Migration path — opt-in build, gradual ecosystem adoption продолжается; major libraries (NumPy, PyTorch) уже adapted.
Status (April 2026): free-threaded build production-ready для use cases где single-thread regression 5–10% acceptable. Ecosystem coverage растёт; FastAPI, polars, и другие major DE-libraries поддерживают.
Pragmatic forward-look: free-threaded Python сейчас (3.14) — legitimate CPU parallelization tool, не «через 2-3 года». Multiprocessing won’t disappear (still useful для process isolation, fault containment, true memory separation), но для shared-memory CPU parallelism free-threaded threading становится viable первым выбором — особенно для DE-workers внутри одного host.
Cite Python 3.14 What’s New — free-threading + PEP 779 + Python free-threading guide.
PEP 734 — sub-interpreters в stdlib (Python 3.14)
PEP 734 — Multiple Interpreters in the Stdlib accepted и реализован в Python 3.14 (October 2025) — модуль interpreters (high-level API над существующими since-1996 sub-interpreters C-API). Это третья parallelism strategy в Python: помимо threading (single GIL — no real CPU parallelism) и multiprocessing (separate OS processes — fork/spawn cost + pickle), теперь per-interpreter GIL — отдельный GIL на каждый sub-interpreter, всё в одном OS process.
Ключевые свойства sub-interpreters:
- Per-interpreter GIL (PEP 684 — реализован в 3.12 как opt-in C-API) — каждый sub-interpreter имеет own interpreter state + own GIL. Real CPU parallelism между sub-interpreters в одном process.
- Isolation guarantees — каждый sub-interpreter имеет own
sys.modules, own globals, own GC state. Mutate в одном sub-interpreter не visible в другом. - Same OS process — единое адресное пространство, shared system resources (file descriptors, sockets), но изолированный Python state.
- No pickle для shared data —
interpreters.Queueподдерживает sharing основных immutable types (int, str, bytes, tuple-of-shareable) без сериализации через C-level binary buffer protocol. Mutable shared memory черезmemoryviewповерх buffer (similar к multiprocessing.shared_memory но без separate process overhead).
# Conceptual — Python 3.14+ runtime (НЕ Pyodide); module 'interpreters' from PEP 734
from concurrent import interpreters
# Создание sub-interpreter
interp = interpreters.create()
# Execute код в isolated interpreter (own GIL, own globals)
result = interp.exec('x = sum(i*i for i in range(10_000_000))')
# main interpreter одновременно может run свой CPU loop — real parallelism
# Cleanup
interp.close()
interpreters.Queue — communication между interpreters:
from concurrent import interpreters
q = interpreters.create_queue()
interp = interpreters.create()
# Передача data через Queue — shareable types (int/str/bytes/tuple) без pickle
q.put((1, 2, 3))
interp.exec("""
from concurrent import interpreters
q = interpreters.lookup_queue(0) # by id
data = q.get()
""")
Comparison: sub-interpreters vs multiprocessing:
| Critère | multiprocessing | interpreters (PEP 734) |
|---|---|---|
| OS process | Separate (fork / spawn) | Same process |
| Startup cost | 50–100 ms (spawn) / 10 ms (fork) | ~1–5 ms (lightweight) |
| Memory overhead | Full Python interpreter copy (~10–30 MB) | Shared C runtime, isolated Python state (~3–8 MB) |
| Shared data | Pickle для всего | No pickle — shareable types через buffer protocol |
| Shared memory | Array / shared_memory (explicit) | memoryview поверх C buffer + interpreters.Queue |
| Fault isolation | Хорошая — crash в child не валит parent | Слабее — segfault в C-extension валит весь process |
| GIL | Own GIL per process | Own GIL per sub-interpreter (PEP 684) |
| Cross-platform | Yes (fork только Linux/macOS; spawn — везде) | Yes (single C-runtime path) |
Когда выбирать sub-interpreters над multiprocessing:
- Нужен real CPU parallelism + lightweight startup (e.g., dynamically spawning workers under load).
- Передача больших immutable data structures (tuple of bytes) — pickle overhead доминирует в multiprocessing.
- Embedded scenarios — single OS process требование (e.g., shared file descriptors, container resource limits).
Когда multiprocessing всё ещё лучше:
- Fault isolation — crash в worker не должен валить master (multiprocessing — separate processes — natural isolation).
- C-extensions без sub-interpreter support — many libraries (numpy <2.0, pandas, lxml) не isolation-safe (global C state). Use multiprocessing для compatibility.
- Cross-machine workloads — sub-interpreters single-host only; multiprocessing extends к distributed (Dask, Ray) с similar API.
Pitfall: C-extensions support uneven. Sub-interpreter requires module to be isolated-safe — declare
Py_MOD_PER_INTERPRETER_GIL_SUPPORTEDslot. Many older C-extensions держат module-level state в C globals — crash на init второго sub-interpreter. Status April 2026: numpy 2.x supports, pyarrow в transition, pure-Python libraries — works automatically. Cite PEP 684 Section “Module Slots” + Python 3.14 What’s New — interpreters.
Production status (April 2026): PEP 734 stdlib interpreters module — supported в Python 3.14 (released Oct 7, 2025). Free-threaded build (PEP 779, см. выше) — alternative path к real CPU parallelism через threads (no GIL вообще). Two strategies coexist:
- Free-threaded build (
python3.14t) — no GIL, threads share state freely, single-thread overhead 5–10%. interpretersmodule (defaultpython3.14) — GIL retained per interpreter, isolation strong, single-thread overhead 0%.
Ecosystem evolves; для production CPU-parallel workloads consider both paths и choose по compatibility / safety priorities.
Cite PEP 734 — Multiple Interpreters in the Stdlib + PEP 684 — Per-Interpreter GIL + Python 3.14 What’s New.
Cross-course → Spark / DataFusion — distributed avoiding GIL
Cross-course → Spark: 01/02 driver-executor-model — Spark распределяет CPU work через separate JVM workers (executors) precisely потому что Python (PySpark driver) не может threading-parallelize CPU work — GIL constraint. Worker JVMs run в parallel, communicate с Python driver через RPC. Same architectural pattern: «GIL? Fine, мы не Python threading — мы separate processes/workers».
Cross-course → DataFusion: 02/06 crate-architecture — DataFusion (Rust) использует Tokio thread pool для query execution — Rust no GIL → real thread parallelism (pre-PEP 703 архитектура без single-thread limit). DataFusion может saturate всех CPU cores с single process — что Python требует multiprocessing для achieve.
Decision matrix synthesis:
| Workload | Single-process Python | Distributed |
|---|---|---|
| I/O-bound, ≤100 concurrent | threading | — |
| I/O-bound, 100-1000+ | asyncio | — |
| CPU-bound, single machine | multiprocessing | — |
| CPU-bound, big data | — | Spark / DataFusion |
GIL constraint объясняет почему Spark / DataFusion architectures decentralize: они bypass Python single-process GIL limitation through separate workers / Rust no-GIL.
Что в следующем уроке
Урок 07 — toolchain summary + py-spy / line_profiler / memray / austin brief mention. M12 recap + bridge к M13 (packaging) + forward-link Phase 70.
Pragmatic-DEEP принцип: GIL — single most-misunderstood Python feature. Знать когда GIL released (I/O / C-extensions) vs когда нет (pure Python CPU) — production-critical decision skill. PEP 703 / PEP 779 — free-threaded build supported в 3.14, single-thread overhead 5–10%; meaningful уже сейчас, не в отдалённом будущем.
Cite Python GIL FAQ + PEP 703 — no-GIL + PEP 779 — free-threaded supported status + PEP 3148 — concurrent.futures + docs.python.org/3/library/concurrent.futures.html.