Hi Gyula, Thanks a lot. That's enough for my case.
I do really love Flink Streaming model compare to Spark Streaming. So is that true that i can think that Operator as an Actor model in this system ? Is that a right way to put it ? Cheers On Fri, Jul 3, 2015 at 5:29 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > Hey, > > 1. > Yes, if you use partitionBy the same key will always go to the same > downstream operator instance. > > 2. > There is only partial ordering guarantee, meaning that data received from > one input is FIFO. This means that if the same key is coming from multiple > inputs than there is no ordering guarantee there, only inside one input. > > Gyula > > Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P, > 11:51): > >> Hi Gyula, >> >> Thanks for your response. >> >> So if i use partitionBy then data point with the same will receive >> exactly by the same instance of operator ? >> >> >> Another question is if i execute reduce() operator on after partitionBy, >> will that reduce operator guarantee ordering within the same key ? >> >> >> Cheers >> >> On Fri, Jul 3, 2015 at 4:14 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: >> >>> Hey! >>> >>> Both groupBy and partitionBy will trigger a shuffle over the network >>> based on some key, assuring that elements with the same keys end up on the >>> same downstream processing operator. >>> >>> The difference between the two is that groupBy in addition to this >>> returns a GroupedDataStream which lets you execute some special operations, >>> such as key based rolling aggregates. >>> >>> PartitionBy is useful when you are using simple operators but still want >>> to control the messages received by parallel instances (in a mapper for >>> example). >>> >>> Cheers, >>> Gyula >>> >>> tambunanw <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P, >>> 10:32): >>> >>>> Hi All, >>>> >>>> I'm trying to digest what's the difference between this two. From my >>>> experience in Spark GroupBy will cause shuffling on the network. Is >>>> that the >>>> same case in Flink ? >>>> >>>> I've watch videos and read a couple docs about Flink that's actually >>>> Flink >>>> will compile the user code into it's own optimized graph structure so i >>>> think Flink engine will take care of this one ? >>>> >>>> From the docs for Partitioning >>>> >>>> >>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#partitioning >>>> >>>> Is that true that GroupBy is more advanced than PartitionBy ? Can >>>> someone >>>> elaborate ? >>>> >>>> I think this one is really confusing for me that come from Spark world. >>>> Any >>>> help would be really appreciated. >>>> >>>> Cheers >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Streaming-PartitionBy-vs-GroupBy-differences-tp1927.html >>>> Sent from the Apache Flink User Mailing List archive. mailing list >>>> archive at Nabble.com. >>>> >>> >> >> >> -- >> Welly Tambunan >> Triplelands >> >> http://weltam.wordpress.com >> http://www.triplelands.com <http://www.triplelands.com/blog/> >> > -- Welly Tambunan Triplelands http://weltam.wordpress.com http://www.triplelands.com <http://www.triplelands.com/blog/>