Table of Contents

Threads v.s. Processes

Problems with multiprocessing v.s. multithreading:

Threads all share the same starting code. Consider adding in initialization logic for all threads, v.s. all processes. With processes, every entry point into the program needed to be updated, which can be cumbersome. Recently I tried adding some signal handlers to a program that aid in debugging. Having lot's of processes (like cron's) made it annoying to track them all down. Additionally, with multiprocessing, it may not be possible to set this up, as you may not be in control of main() everywhere. In my case, rq python spawned the Python VM and invoked my function, preventing any change of setting up the handlers first.

Problems with fork()

Using fork() (esp. with Python) is an easy way to get around the GIL limitation on CPU bound tasks. However, it brings up some complications.

The main reason I see for using fork() is that you don't need to serialize (pickly) the data shared between forked processes. This is especially useful for sharing lambdas and other closures, which cannot be serialized or imported.

Process Wide

Here are some things that are shared across the process:

  1. Environment variables
  2. Argv
  3. Signal Handlers
  4. FDs
  5. Linked Libraries
  6. Statically initialized storage
  7. Page Table
    1. TLB?