Re: BucketingSink never closed

2017-09-13 Thread Flavio Pompermaier
I've just looked at Robert presentation at FF [1] and that's exactly what I was waiting for streaming planning/training... Very useful ;) [1] https://www.youtube.com/watch?v=8l8dCKMMWkw On Wed, Sep 13, 2017 at 12:04 PM, Flavio Pompermaier wrote: > Hi Gordon, > thanks for your feedback. The main

Re: BucketingSink never closed

2017-09-13 Thread Flavio Pompermaier
Hi Gordon, thanks for your feedback. The main problem for me is that moving from batch to stream should be much easier IMHO. Rows should be a first class citizen in Flink and should be VERY easy to read/write them, while at the moment it seems that Tuples are the dominating type...I don't want to

Re: BucketingSink never closed

2017-09-13 Thread Tzu-Li (Gordon) Tai
Ah, sorry, one correction. Just realized there’s already some analysis of the BucketingSink closing issue in this mail thread. Please ignore my request for relevant logs :) On 13 September 2017 at 10:56:10 AM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi Flavio, Let me try to understan

Re: BucketingSink never closed

2017-09-13 Thread Tzu-Li (Gordon) Tai
Hi Flavio, Let me try to understand / look at some of the problems you have encountered. checkpointing: it's not clear which checkpointing system to use and how to tune/monitor it and avoid OOM exceptions. What do you mean be which "checkpointing system” to use? Do you mean state backends? Typic

Re: BucketingSink never closed

2017-09-12 Thread Flavio Pompermaier
For the moment I give up with streaming...too many missing/unclear features wrt batch. For example: - checkpointing: it's not clear which checkpointing system to use and how to tune/monitor it and avoid OOM exceptions. Moreover is it really necessary to use it? For example if I read a fil

Re: BucketingSink never closed

2017-09-08 Thread Aljoscha Krettek
Hi, Expanding a bit on Kostas' answer. Yes, your analysis is correct, the problem is that the job is shutting down before a last checkpoint can "confirm" the written bucket data by moving it to the final state. The problem, as Kostas noted is that a user function (and thus also BucketingSink) d

Re: BucketingSink never closed

2017-09-08 Thread Kostas Kloudas
Hi Flavio, If I understand correctly, I think you bumped into this issue: https://issues.apache.org/jira/browse/FLINK-2646 There is also a similar discussion on the BucketingSink here: http://apache-flink-mailing-list-archive.1008284.n3.nabble

BucketingSink never closed

2017-09-08 Thread Flavio Pompermaier
Hi to all, I'm trying to test a streaming job but the files written by the BucketingSink are never finalized (remains into the pending state). Is this caused by the fact that the job finishes before the checkpoint? Shouldn't the sink properly close anyway? This is my code: @Test public void t