Re: PyFlink Table API and UDF Limitations

2020-11-27 Thread Niklas Wilcke
Hi Xingbo, thanks for sharing. This is very interesting. Regards, Niklas > On 27. Nov 2020, at 03:05, Xingbo Huang wrote: > > Hi Niklas, > > Thanks a lot for supporting PyFlink. In fact, your requirement for multiple > input and multiple output is essentially Table Aggregation Functions[1].

Re: PyFlink Table API and UDF Limitations

2020-11-26 Thread Xingbo Huang
Hi Niklas, Thanks a lot for supporting PyFlink. In fact, your requirement for multiple input and multiple output is essentially Table Aggregation Functions[1]. Although PyFlink does not support it yet, we have listed it in the release 1.13 plan. In addition, row-based operations[2] that are very

Re: PyFlink Table API and UDF Limitations

2020-11-26 Thread Niklas Wilcke
Hi Xingbo, thanks for taking care and letting me know. I was about to share an example, how to reproduce this. Now I will wait for the next release candidate and give it a try. Regards, Niklas -- niklas.wil...@uniberg.com Mobile: +49 160 9793 2593 Office: +49 40 2380 6523

Re: PyFlink Table API and UDF Limitations

2020-11-25 Thread Xingbo Huang
Hi Niklas, Regarding `Exception in thread "grpc-nio-worker-ELG-3-2" java.lang.NoClassDefFoundError: org/apache/beam/vendor/grpc/v1p26p0/io/netty/buffer/PoolArena$1`, it does not affect the correctness of the result. The reason is that some resources are released asynchronously when Grpc Server is

Re: PyFlink Table API and UDF Limitations

2020-11-16 Thread Dian Fu
Hi Niklas, > How can I ingest data in a batch table from Kafka or even better > Elasticsearch. Kafka is only offering a Streaming source and Elasticsearch > isn't offering a source at all. > The only workaround which comes to my mind is to use the Kafka streaming > source and to apply a single

Re: PyFlink Table API and UDF Limitations

2020-11-13 Thread Dian Fu
Hi Niklas, Good to know that this solution may work for you. Regarding to the questions you raised, please find my reply inline. Regards, Dian > 在 2020年11月13日,下午8:48,Niklas Wilcke 写道: > > Hi Dian, > > thanks again for your response. In the meantime I tried out your proposal > using the

Re: PyFlink Table API and UDF Limitations

2020-11-13 Thread Niklas Wilcke
Hi Dian, thanks again for your response. In the meantime I tried out your proposal using the UDAF feature of PyFlink 1.12.0-rc1 and it is roughly working, but I am facing some issues, which I would like to address. If this goes too far, please let me know and I will open a new thread for each

Re: PyFlink Table API and UDF Limitations

2020-11-12 Thread Dian Fu
Hi Niklas, Python DataStream API will also be supported in coming release of 1.12.0 [1]. However, the functionalities are still limited for the time being compared to the Java DataStream API, e.g. it will only support the stateless operations, such as map, flat_map, etc. [1]

Re: PyFlink Table API and UDF Limitations

2020-11-12 Thread Niklas Wilcke
Hi Dian, thank you very much for this valuable response. I already read about the UDAF, but I wasn't aware of the fact that it is possible to return and UNNEST an array. I will definitely have a try and hopefully this will solve my issue. Another question that came up to my mind is whether

Re: PyFlink Table API and UDF Limitations

2020-11-11 Thread Dian Fu
Hi Niklas, You are correct that the input/output length of Pandas UDF must be of the same size and that Flink will split the input data into multiple bundles for Pandas UDF and the bundle size is non-determinstic. Both of the above two limitations are by design and so I guess Pandas UDF could

PyFlink Table API and UDF Limitations

2020-11-11 Thread Niklas Wilcke
Hi Flink Community, I'm currently trying to implement a parallel machine learning job with Flink. The goal is to train models in parallel for independent time series in the same data stream. For that purpose I'm using a Python library, which lead me to PyFlink. Let me explain the use case a