Table of Contents

Python

Python is a programming language common in the data science space. Here is a list of gotchas that I have encountered while using it, coming from a Java world.

Wishlist

Typesafety is complex.

Static checkers are okay, but the runtime type system is pretty weak. There is no equivalent of type capture in Java (e.g. static <T> List<T> makeList(T item) {…}

''isinstance()''

throws if the second arg is a generic type. For example isinstance(var, dict[str, Any]). This is annoying because you only find out at runtime that it's broken.

Lack of Queue implementations

Trying to implement Java's ThreadPoolExecutor needs more diverse Queue implementations.

Problems with Multiprocessing

Caching comes in many forms, but one surprising way that it shows up is in connection caching (a.k.a. pooling). Without special handling, connections cannot be shared between processes. When a Python program spawns another process, the new process does not inherit the file descriptors (fds) and thus has to re create them itself. This has many problems:

Multiprocessing skips the atexit hooks, meaning it's not possible to do process-wide cleanup work. They are silently skipped.

Problems with Descriptors

Evaluation order matters. For example:

a: Descriptor
b: Descriptor
 
 
foo(a, b)

In this, if a or b have side-effects on evaluation, like throwing an exception, the order in which a and b are invoked matters.

Futures