stream keyBy without repartition

2016-05-24 Thread Bart Wyatt
(migrated from IRC) Hello All, My situation is this: I have a large amount of data partitioned in kafka by "session" (natural partitioning). After I read the data, I would like to do as much as possible before incurring re-serialization or network traffic due to the size of the data. I am

Re: stream keyBy without repartition

2016-05-25 Thread Bart Wyatt
(or an example with the same characteristics / communication patterns if the real code is not possible) so that we can have a look and potentially find other parts of the pipeline that can be optimized. For example, given that you are concerned with the serialization overhead, it may be worth s

enableObjectReuse and multiple downstream operators

2016-05-25 Thread Bart Wyatt
(For reference, I'm in 1.0.3) I have a job that looks like this: DataStream input = ... input .map(MapFunction...) .addSink(...); input .map(MapFunction...) ?.addSink(...); If I do not call enableObjectReuse() it works, if I do call enableObjectReuse() it throws: java

Re: stream keyBy without repartition

2016-05-25 Thread Bart Wyatt
heers, -Bart From: Aljoscha Krettek Sent: Wednesday, May 25, 2016 9:14 AM To: user@flink.apache.org Subject: Re: stream keyBy without repartition In the long run we probably have to provide a hook in the API for this, yes. On Wed, 25 May 2016 at 15:54 Bart

Ordering expectations of data

2016-08-12 Thread Bart Wyatt
​Hello all, We have a kafka topic with lots of partitions where data is partitioned by an upstream publisher on "session". In flink we read this topic and another single partition topic which contains configuration definitions for a little flatMap based operation. We also do a little bit