write into parquet with variable number of columns

2021-08-05 Thread Sharipov, Rinat
va:334)* Maybe someone has tried this feature and can guess what's wrong with the current code and how to make it work. Anyway I have a fallback - accumulate a butch of events, define the schema for them and write into file system manually, but I still hope that I can do this in more elegant way. Thx for your advice and time ! -- Best Regards, *Sharipov Rinat* CleverDATA make your data clever

Re: [Flink::Test] access registered accumulators via harness

2020-10-30 Thread Sharipov, Rinat
e processing > } > > } > > That way you can easily test your MyLogic class including interactions > with the counter, by passing a mock counter. > > Best, > > Dawid > > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/us

[Flink::Test] access registered accumulators via harness

2020-10-27 Thread Sharipov, Rinat
Hi mates ! I guess that I'm doing something wrong, but I couldn't find a way to access registered accumulators and their values via *org.apache.flink.streaming.util.**ProcessFunctionTestHarness *function wrapper that I'm using to test my functions. During the code research I've found, that

Re: PyFlink :: Bootstrap UDF function

2020-10-15 Thread Sharipov, Rinat
on and > you could also override them and perform the initialization work in the > open method. > > Regards, > Dian > > 在 2020年10月15日,上午3:19,Sharipov, Rinat 写道: > > Hi mates ! > > I keep moving in my research of new features of PyFlink and I'm really

PyFlink :: Bootstrap UDF function

2020-10-14 Thread Sharipov, Rinat
Hi mates ! I keep moving in my research of new features of PyFlink and I'm really excited about that functionality. My main goal is to understand how to integrate our ML registry, powered by ML Flow and PyFlink jobs and what restrictions we have. I need to bootstrap the UDF function on it's

Re: PyFlink 1.11.2 couldn’t configure [taskmanager.memory.task.off-heap.size] property when registering custom UDF function

2020-10-13 Thread Sharipov, Rinat
pi to set configuration: > table_env.get_config().get_configuration().set_string("taskmanager.memory.task.off-heap.size", > '80m') > > The flink-conf.yaml way will only take effect when submitted through flink > run, and the minicluster way(python xxx.py) will not take effec

PyFlink 1.11.2 couldn’t configure [taskmanager.memory.task.off-heap.size] property when registering custom UDF function

2020-10-12 Thread Sharipov, Rinat
Hi mates ! I'm very new at pyflink and trying to register a custom UDF function using python API. Currently I faced an issue in both server env and my local IDE environment. When I'm trying to execute the example below I got an error message: *The configured Task Off-Heap Memory 0 bytes is less

Re: [PyFlink] update udf functions on the fly

2020-10-12 Thread Sharipov, Rinat
me custom strategy. It's the behavior of the > UDF. > > Regards, > Dian > > 在 2020年10月12日,下午5:51,Sharipov, Rinat 写道: > > Hi Arvid, thx for your reply. > > We are already using the approach with control streams to propagate > business rules through our data-pipeline. &g

Re: [PyFlink] update udf functions on the fly

2020-10-12 Thread Sharipov, Rinat
tream API, the common way to > simulate side inputs (which is what you need) is to use a broadcast. There > is an example on SO [1]. > > [1] > https://stackoverflow.com/questions/54667508/how-to-unit-test-broadcastprocessfunction-in-flink-when-processelement-depends-o > > On Sat, Oc

Re: [PyFlink] register udf functions with different versions of the same library in the same job

2020-10-12 Thread Sharipov, Rinat
what would be even better with different > python environments and they won't clash > > A PyFlink job All nodes use the same python environment path currently. So > there is no way to make each UDF use a different python execution > environment. Maybe you need to use multiple jobs to a

[PyFlink] update udf functions on the fly

2020-10-10 Thread Sharipov, Rinat
Hi mates ! I'm in the beginning of the road of building a recommendation pipeline on top of Flink. I'm going to register a list of UDF python functions on job startups where each UDF is an ML model. Over time new model versions appear in the ML registry and I would like to update my UDF

[PyFlink] register udf functions with different versions of the same library in the same job

2020-10-09 Thread Sharipov, Rinat
Hi mates ! I've just read an amazing article about PyFlink and I'm absolutely delighted. I got some questions about udf registration, and it seems that it's possible to specify the list of libraries that

“feedback loop” and checkpoints in itearative streams

2020-04-04 Thread Sharipov, Rinat
Hi mates, for some reason, it's necessary to create a feedback look in my streaming application. The best choice to implement it was iterative stream, but at the moment of job implementation (flink version is 1.6.1) it wasn't checkpointed. So I decided to write this output into kafka. As I see,