Hi Rinat, It's called in single thread fashion and so there is no need for the synchronization.
Besides, there is a pair of open/close methods in the ScalarFunction and you could also override them and perform the initialization work in the open method. Regards, Dian > 在 2020年10月15日,上午3:19,Sharipov, Rinat <r.shari...@cleverdata.ru> 写道: > > Hi mates ! > > I keep moving in my research of new features of PyFlink and I'm really > excited about that functionality. > My main goal is to understand how to integrate our ML registry, powered by ML > Flow and PyFlink jobs and what restrictions we have. > > I need to bootstrap the UDF function on it's startup when it's instantiated > in the Apache Beam process, but I don't know how it's called by PyFlink in > single thread fashion or shared among multiple threads. In other words, I > want to know, should I care about synchronization of my bootstrap logic or > not. > > Here is a code example of my UDF function: > class MyFunction(ScalarFunction): > def __init__(self): > self.initialized = False > > def __bootstrap(self): > return "bootstrapped" > > def eval(self, urls): > if self.initialized: > self.__bootstrap() > return "my-result" > > my_function = udf(MyFunction(), [DataTypes.ARRAY(DataTypes.STRING())], > DataTypes.STRING()) > > Thx a lot for your help !