Why not use a singleton like pattern and have a function which either loads
and caches the ML model from a side input or returns the singleton if it
has been loaded.
You'll want to use some form of locking to ensure that you really only load
the ML model once.

On Wed, May 24, 2017 at 6:18 AM, Vilhelm von Ehrenheim <
vonehrenh...@gmail.com> wrote:

> Hi all!
> I would like to load a heavy object (think ML model) into memory that
> should be available in a ParDo for quick predictions.
>
> What is the preferred way of doing this without loading the model for each
> ParDo call (slow and will flood memory on the nodes). I don't seem to be
> able to do it in the DoFn's __init__ block either as this is only done once
> for all nodes (my guess here though) and then it breaks when replicated
> internally (even on the DirectRunner, I suspect it is pickled and this
> object cannot be pickled). If I load it as a side input it seems to still
> be loaded into memory separately for each ParDo.
>
> If there is a better way to handle it in Java I'm happy to do it there
> instead. It was just easier to attack the problem w python as the models
> were developed in python.
>
> Any sort of pointers or tips are welcome!
>
> Thanks!
> Vilhelm von Ehrenheim
>

Reply via email to