This is an old revision of the document!
Table of Contents
Java and Python Differences
Java and Python are both garbage collected languages that run in a VM. Each of them has rich library support and large user bases. Both are good for allowing multiple people on a team to work on the same codebase.
However, there are some hard to reconcile differences between the two.
Parallelism Challenges
The de-facto way to parallelize code in Python is via multiprocessing. This can be done either by invoking os.fork() and managing them directly, or by using concurrent.futures.ProcessPoolExecutor.
In order to pass work between processes, the data MUST be serialized, typically using the pickle module. This has important consequences, as some language level constructs are not usable. When a task needs to be split up, (or split off), the program gathers the name and arguments, and spawn/forks off other processes to handle the work.
Functions can't be serialized
One problem with functions is that they cannot be serialized. Metadata about the function can be serialized, but the function itself cannot. This means closures don't work with multiprocessing:
def do_work(items: list[int]) -> int: def _worker(chunk: list[int]) -> int: count = 0 for item in chunk: if item == 0: count += 1 return count n = len(items) count = 0 with concurrent.futures.ProcessPoolExecutor() as executor: count += sum(executor.map(_worker, items[:int(n / 2)])) count += sum(executor.map(_worker, items[int(n/2):])) return count print(do_work([0, 1, 2, 3]))
Trying to run this we get:
File "/usr/lib/python3.11/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'do_work.<locals>._worker'
Threads v.s. Processes contains an overview of how threads and processes are treated differently. Python can use threads for IO bound work, but this appears to be less used in favor of asyncio.
