Hi all! I would like to load a heavy object (think ML model) into memory that should be available in a ParDo for quick predictions.
What is the preferred way of doing this without loading the model for each ParDo call (slow and will flood memory on the nodes). I don't seem to be able to do it in the DoFn's __init__ block either as this is only done once for all nodes (my guess here though) and then it breaks when replicated internally (even on the DirectRunner, I suspect it is pickled and this object cannot be pickled). If I load it as a side input it seems to still be loaded into memory separately for each ParDo. If there is a better way to handle it in Java I'm happy to do it there instead. It was just easier to attack the problem w python as the models were developed in python. Any sort of pointers or tips are welcome! Thanks! Vilhelm von Ehrenheim
