Threads v.s. Processes
- - Problems with fork()
- Process Wide

Threads v.s. Processes

Problems with multiprocessing v.s. multithreading:

Threads all share the same starting code. Consider adding in initialization logic for all threads, v.s. all processes. With processes, every entry point into the program needed to be updated, which can be cumbersome. Recently I tried adding some signal handlers to a program that aid in debugging. Having lot's of processes (like cron's) made it annoying to track them all down. Additionally, with multiprocessing, it may not be possible to set this up, as you may not be in control of main() everywhere. In my case, rq python spawned the Python VM and invoked my function, preventing any change of setting up the handlers first.

Problems with fork()

Using fork() (esp. with Python) is an easy way to get around the GIL limitation on CPU bound tasks. However, it brings up some complications.

GC cycles can make the program OOM. When a GC cycle starts, lot's of pages in memory will be modified, causing the child and parent to copy-on-write most of their data. This will cause memory to potentially double in a very short time.
Threads don't work with fork(). The Python docs now explicitly call out that forking with threads in the process is unsupported (per Posix).
Handling file descriptors is tricky. I'm not sure things like epoll and eventfd do the right thing after forking.

The main reason I see for using fork() is that you don't need to serialize (pickly) the data shared between forked processes. This is especially useful for sharing lambdas and other closures, which cannot be serialized or imported.

Process Wide

Here are some things that are shared across the process:

Environment variables
Argv
Signal Handlers
FDs
Linked Libraries
Statically initialized storage
Page Table
1. TLB?

Table of Contents

Threads v.s. Processes

Problems with fork()

Process Wide