Re: Flink Streaming : PartitionBy vs GroupBy differences

Welly Tambunan Fri, 03 Jul 2015 04:03:09 -0700

Hi Gyula,

Thanks a lot. That's enough for my case.


I do really love Flink Streaming model compare to Spark Streaming.

So is that true that i can think that Operator as an Actor model in this
system ? Is that a right way to put it ?



Cheers

On Fri, Jul 3, 2015 at 5:29 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:

> Hey,
>
> 1.
> Yes, if you use partitionBy the same key will always go to the same
> downstream operator instance.
>
> 2.
> There is only partial ordering guarantee, meaning that data received from
> one input is FIFO. This means that if the same key is coming from multiple
> inputs than there is no ordering guarantee there, only inside one input.
>
> Gyula
>
> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
> 11:51):
>
>> Hi Gyula,
>>
>> Thanks for your response.
>>
>> So if i use partitionBy then data point with the same will receive
>> exactly by the same instance of operator ?
>>
>>
>> Another question is if i execute reduce() operator on after partitionBy,
>> will that reduce operator guarantee ordering within the same key ?
>>
>>
>> Cheers
>>
>> On Fri, Jul 3, 2015 at 4:14 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
>>
>>> Hey!
>>>
>>> Both groupBy and partitionBy will trigger a shuffle over the network
>>> based on some key, assuring that elements with the same keys end up on the
>>> same downstream processing operator.
>>>
>>> The difference between the two is that groupBy in addition to this
>>> returns a GroupedDataStream which lets you execute some special operations,
>>> such as key based rolling aggregates.
>>>
>>> PartitionBy is useful when you are using simple operators but still want
>>> to control the messages received by parallel instances (in a mapper for
>>> example).
>>>
>>> Cheers,
>>> Gyula
>>>
>>> tambunanw <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
>>> 10:32):
>>>
>>>> Hi All,
>>>>
>>>> I'm trying to digest what's the difference between this two. From my
>>>> experience in Spark GroupBy will cause shuffling on the network. Is
>>>> that the
>>>> same case in Flink ?
>>>>
>>>> I've watch videos and read a couple docs about Flink that's actually
>>>> Flink
>>>> will compile the user code into it's own optimized graph structure so i
>>>> think Flink engine will take care of this one ?
>>>>
>>>> From the docs for Partitioning
>>>>
>>>>
>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#partitioning
>>>>
>>>> Is that true that GroupBy is more advanced than PartitionBy ? Can
>>>> someone
>>>> elaborate ?
>>>>
>>>> I think this one is really confusing for me that come from Spark world.
>>>> Any
>>>> help would be really appreciated.
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Streaming-PartitionBy-vs-GroupBy-differences-tp1927.html
>>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>>> archive at Nabble.com.
>>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Re: Flink Streaming : PartitionBy vs GroupBy differences

Reply via email to