stream keyBy without repartition

2016-05-24 Thread Bart Wyatt
(migrated from IRC) Hello All, My situation is this: I have a large amount of data partitioned in kafka by "session" (natural partitioning). After I read the data, I would like to do as much as possible before incurring re-serialization or network traffic due to the size of the data. I am

Re: stream keyBy without repartition

2016-05-24 Thread Kostas Kloudas
Hi Bart, From what I understand, you want to do a partial (per node) aggregation before shipping the result for the final one at the end. In addition, the keys do not seem to change between aggregations, right? If this is the case, this is the functionality of the Combiner in batch. In Batch

Re: stream keyBy without repartition

2016-05-25 Thread Aljoscha Krettek
Hi, what Kostas said is correct. You can however, hack it. You would have to manually instantiate a WindowOperator and apply it on the non-keyed DataStream while still providing a key-selector (and serializer) for state. This might sound complicated but I'll try and walk you through the steps. Ple

Re: stream keyBy without repartition

2016-05-25 Thread Bart Wyatt
constructor that uses a ForwardPartitioner? This was what I was going to try this morning until you gave me a path that doesn't involve editing flink code. -Bart From: Aljoscha Krettek Sent: Wednesday, May 25, 2016 4:07 AM To: user@flink.apache.org

Re: stream keyBy without repartition

2016-05-25 Thread Aljoscha Krettek
doesn't involve editing flink code. > > > -Bart > > > > -- > *From:* Aljoscha Krettek > *Sent:* Wednesday, May 25, 2016 4:07 AM > *To:* user@flink.apache.org > *Subject:* Re: stream keyBy without repartition > > Hi, > what

Re: stream keyBy without repartition

2016-05-25 Thread Bart Wyatt
heers, -Bart From: Aljoscha Krettek Sent: Wednesday, May 25, 2016 9:14 AM To: user@flink.apache.org Subject: Re: stream keyBy without repartition In the long run we probably have to provide a hook in the API for this, yes. On Wed, 25 May 2016 at 15:54 Bart