Re: Iteration feedback partitioning does not work properly

2015-10-08 Thread Paris Carbone
Yes it does break it since it is based on backwards partitioning preservation which was the case before Aljischa’s refactoring. I will focus on a 0.10 patch for the samoa connector right after the 0.10 release to see how we can do this. To be honest the whole thing confuses me a bit. From my und

Re: Iteration feedback partitioning does not work properly

2015-10-08 Thread Gyula Fóra
I agree that there are many things that needs to be figured out properly for iterations, and I am okay with postponing them for the next release if we want to get this one out quickly. The only problem is that this probably breaks the SAMOA connector. Paris can you confirm this? Stephan Ewen ez

Re: Iteration feedback partitioning does not work properly

2015-10-08 Thread Stephan Ewen
For me as an outsider to the iterations, I would say that both approaches are in some way tricky with some unexpected behavior. Parallelism implicitly from the predecessor (input) or the successor (head task - what happens if there are multiple with different parallelism?) can confuse in either wa

Re: Iteration feedback partitioning does not work properly

2015-10-08 Thread Gyula Fóra
The feedback tuples might get rebalanced but the normal input should not. But still the main problem is the fact that partitioning is not handled transparently, and actually does not work when you set the way you expect. Gyula Aljoscha Krettek ezt írta (időpont: 2015. okt. 8., Cs, 16:33): > Ok

Re: Iteration feedback partitioning does not work properly

2015-10-08 Thread Aljoscha Krettek
Ok, I see your point. But I think there will be problems no matter what parallelism is chosen for the iteration source/sink. If the parallelism of the head is chosen then there will be an implicit rebalance from the operation right before the iteration to the iteration head. I think this should bre

Re: Iteration feedback partitioning does not work properly

2015-10-06 Thread Gyula Fóra
Hi, This is just a workaround, which actually breaks input order from my source. I think the iteration construction should be reworked to set the parallelism of the source/sink to the parallelism of the head operator (and validate that all heads have the same parallelism). I thought this was the

Re: Iteration feedback partitioning does not work properly

2015-10-06 Thread Aljoscha Krettek
Hi, I think what you would like to to can be achieved by: IterativeStream it = in.map(IdentityMap).setParallelism(2).iterate() DataStream mapped = it.map(...) it.closeWith(mapped.partitionByHash(someField)) The input is rebalanced to the map inside the iteration as in your example and the feedba

Iteration feedback partitioning does not work properly

2015-10-05 Thread Gyula Fóra
Hey, This question is mainly targeted towards Aljoscha but maybe someone can help me out here: I think the way feedback partitioning is handled does not work, let me illustrate with a simple example: IterativeStream it = ... (parallelism 1) DataStream mapped = it.map(...) (parallelism 2) // this