Re: ModuleNotFound when loading udf from another py file

2021-04-27 Thread Dian Fu
Thanks a lot~ > 2021年4月28日 上午8:25,Yik San Chan 写道: > > Hi Dian, > > I follow up with this PR https://github.com/apache/flink/pull/15790 > > On Tue, Apr 27, 2021 at 11:03 PM Dian Fu > wrote: > Hi Yik San, > > Make sens

Re: ModuleNotFound when loading udf from another py file

2021-04-27 Thread Yik San Chan
Hi Dian, I follow up with this PR https://github.com/apache/flink/pull/15790 On Tue, Apr 27, 2021 at 11:03 PM Dian Fu wrote: > Hi Yik San, > > Make sense to me. :) > > Regards, > Dian > > 2021年4月27日 下午9:50,Yik San Chan 写道: > > Hi Dian, > > Wow, this is unexpected 😮 How about adding documentati

Re: ModuleNotFound when loading udf from another py file

2021-04-27 Thread Dian Fu
Hi Yik San, Make sense to me. :) Regards, Dian > 2021年4月27日 下午9:50,Yik San Chan 写道: > > Hi Dian, > > Wow, this is unexpected 😮 How about adding documentations to Python UDF about > this? Again it can be time consuming to figure this out. Maybe: > > To be able to run Python UDFs in any non-l

Re: ModuleNotFound when loading udf from another py file

2021-04-27 Thread Yik San Chan
Hi Dian, Wow, this is unexpected 😮 How about adding documentations to Python UDF about this? Again it can be time consuming to figure this out. Maybe: To be able to run Python UDFs in any non-local mode, it is recommended to include your UDF definitions using -pyfs config option, if your UDFs liv

Re: ModuleNotFound when loading udf from another py file

2021-04-27 Thread Dian Fu
I guess this is the magic of cloud pickle. PyFlink depends on cloud pickle to serialize the Python UDF. For the latter case, I guess the whole Python UDF implementation will be serialized. However, for the previous case, only the path of the class is serialized. Regards, Dian > 2021年4月27日 下午

Re: ModuleNotFound when loading udf from another py file

2021-04-27 Thread Yik San Chan
Hi Dian, Thanks! Adding -pyfs definitely helps. However, I am curious. If I define my udf this way: ```python @udf(input_types=[DataTypes.STRING()], result_type=DataTypes.STRING()) def decrypt(s): import pandas as pd d = pd.read_csv('resources.zip/resources/crypt.csv', header=None, index_col= 0,

Re: ModuleNotFound when loading udf from another py file

2021-04-27 Thread Dian Fu
Hi Yik San, From the exception message, it’s clear that it could not find module `decrypt_fun` during execution. You need to specify file `decrypt_fun.py` as a dependency during submitting the job, e.g. via -pyfs command line arguments. Otherwise, this file will not be available during executi

ModuleNotFound when loading udf from another py file

2021-04-27 Thread Yik San Chan
Hi, Here's the reproducible code sample: https://github.com/YikSanChan/pyflink-quickstart/tree/83526abca832f9ed5b8ce20be52fd506c45044d3 I implement my Python UDF by extending the ScalarFunction base class in a separate file named decrypt_fun.py, and try to import the udf into my main python file