User Tools

Site Tools


python

This is an old revision of the document!


Python

Python is a programming language common in the data science space. Here is a list of gotchas that I have encountered while using it, coming from a Java world.

Wishlist

Typesafety is complex.

Static checkers are okay, but the runtime type system is pretty weak. There is no equivalent of type capture in Java (e.g. static <T> List<T> makeList(T item) {…}

''isinstance()''

throws if the second arg is a generic type. For example isinstance(var, dict[str, Any]). This is annoying because you only find out at runtime that it's broken.

Lack of Queue implementations

Trying to implement Java's ThreadPoolExecutor needs more diverse Queue implementations.

  • queue.Queue throws exceptions when the queue is empty, rather than returning None
  • queue.Queue calls can't be interrupted.
  • No way to have a handoff, similar to SynchronousQueue in Java. This class is the key to a cached ThreadPoolExecutor.
  • queue.SimpleQueue timeout doesn't work. The task-tracking part of Queue and the timeout handling come as a package deal, meaning you get both or neither.

Problems with Multiprocessing

Caching comes in many forms, but one surprising way that it shows up is in connection caching (a.k.a. pooling). Without special handling, connections cannot be shared between processes. When a Python program spawns another process, the new process does not inherit the file descriptors (fds) and thus has to re create them itself. This has many problems:

  • For socket fds, reconnecting is slow. for the caller
  • For socket fds, re-establing an SSL connection is very slow and CPU intensive
  • For socket fds, reconnecting causes the remote endpoint to burn resources as well.
  • For file fds, its not possible to synchronize access to them. (if Python had better shared memory support and the correct atomics to modify the memory, this may be possible).
python.1756569537.txt.gz · Last modified: by carl