====== Python ====== Python is a programming language common in the data science space. Here is a list of gotchas that I have encountered while using it, coming from a Java world. ===== Wishlist ===== ==== Typesafety is complex. ==== Static checkers are okay, but the runtime type system is pretty weak. There is no equivalent of type capture in Java (e.g. ''static List makeList(T item) {...}'' ==== ''isinstance()'' ==== [[https://www.reddit.com/r/learnpython/comments/1dz56ep/isinstancex_listlistint_doesnt_work/|throws]] if the second arg is a generic type. For example ''isinstance(var, dict[str, Any])''. This is annoying because you only find out at runtime that it's broken. ==== Lack of Queue implementations ==== Trying to implement Java's ThreadPoolExecutor needs more diverse Queue implementations. * ''queue.Queue'' throws exceptions when the queue is empty, rather than returning None * ''queue.Queue'' calls can't be interrupted. * No way to have a handoff, similar to ''SynchronousQueue'' in Java. This class is the key to a cached [[java:threadpoolexecutor|ThreadPoolExecutor]]. * ''queue.SimpleQueue'' timeout doesn't work. The task-tracking part of Queue and the timeout handling come as a package deal, meaning you get both or neither. ==== Problems with Multiprocessing ==== Caching comes in many forms, but one surprising way that it shows up is in connection caching (a.k.a. pooling). Without special handling, connections cannot be shared between processes. When a Python program spawns another process, the new process does not inherit the file descriptors (fds) and thus has to re create them itself. This has many problems: * For socket fds, reconnecting is slow. for the caller * For socket fds, re-establing an SSL connection is very slow and CPU intensive * For socket fds, reconnecting causes the remote endpoint to burn resources as well. * For file fds, its not possible to synchronize access to them. (if Python had better shared memory support and the correct atomics to modify the memory, this may be possible). Multiprocessing skips the ''atexit'' hooks, meaning it's not possible to do process-wide cleanup work. They are silently skipped. ==== Problems with Descriptors ==== Evaluation order matters. For example: a: Descriptor b: Descriptor foo(a, b) In this, if a or b have side-effects on evaluation, like throwing an exception, the order in which a and b are invoked matters. ==== Futures ==== * Futures don't have a getstate() method. This means trying to find out the running/finished/cancel/pending states of a future are racy. The lock (condition) on the future is also private, so it's not safe to lock and query each of the state methods.