Cogroup is only the first half of join. It collects the records with the matching key together. It does not do the cross product of records with matching keys.
If you are going to do a join (that is, you want to produce the matching records) join is usually better as there are a number of join optimizations available (skew join, fragment/replicate) which aren’t there for cogroup. But if you don’t need to actually instantiate the records, cogroup can be faster. For example, say you just wanted to count the number of matching records, then doing a cogroup and passing the resulting bags to COUNT would give you your answer. Alan. > On Sep 28, 2016, at 07:15, Kashif Hussain <kash.t...@gmail.com> wrote: > > Hi, > > I want to know in which cases co group can perform better than join ? > What is the advantage of co group ? > > Regards, > Kashif