Cogroup is only the first half of join.  It collects the records with the 
matching key together.  It does not do the cross product of records with 
matching keys.

If you are going to do a join (that is, you want to produce the matching 
records) join is usually better as there are a number of join optimizations 
available (skew join, fragment/replicate) which aren’t there for cogroup.  But 
if you don’t need to actually instantiate the records, cogroup can be faster.  
For example, say you just wanted to count the number of matching records, then 
doing a cogroup and passing the resulting bags to COUNT would give you your 
answer.

Alan.

> On Sep 28, 2016, at 07:15, Kashif Hussain <kash.t...@gmail.com> wrote:
> 
> Hi,
> 
> I want to know in which cases co group can perform better than join ?
> What is the advantage of co group ?
> 
> Regards,
> Kashif

Reply via email to