Hi mates !

I keep moving in my research of new features of PyFlink and I'm really
excited about that functionality.
My main goal is to understand how to integrate our ML registry, powered by
ML Flow and PyFlink jobs and what restrictions we have.

I need to bootstrap the UDF function on it's startup when it's instantiated
in the Apache Beam process, but I don't know how it's called by PyFlink in
single thread fashion or shared among multiple threads. In other words, I
want to know, should I care about synchronization of my bootstrap logic or
not.

Here is a code example of my UDF function:













*class MyFunction(ScalarFunction):    def __init__(self):
self.initialized = False    def __bootstrap(self):        return
"bootstrapped"    def eval(self, urls):        if self.initialized:
        self.__bootstrap()        return "my-result"my_function =
udf(MyFunction(), [DataTypes.ARRAY(DataTypes.STRING())],
DataTypes.STRING())*


Thx a lot for your help !

Reply via email to