Will a co group with filter be equivalent to join ? I mean will pig optimize the former to achieve performance equivalent to latter ? I assume that single map reduce job will be spawned in both cases.
On Wed, Sep 28, 2016 at 11:14 PM, Alan Gates <alanfga...@gmail.com> wrote: > Cogroup is only the first half of join. It collects the records with the > matching key together. It does not do the cross product of records with > matching keys. > > If you are going to do a join (that is, you want to produce the matching > records) join is usually better as there are a number of join optimizations > available (skew join, fragment/replicate) which aren’t there for cogroup. > But if you don’t need to actually instantiate the records, cogroup can be > faster. For example, say you just wanted to count the number of matching > records, then doing a cogroup and passing the resulting bags to COUNT would > give you your answer. > > Alan. > > > On Sep 28, 2016, at 07:15, Kashif Hussain <kash.t...@gmail.com> wrote: > > > > Hi, > > > > I want to know in which cases co group can perform better than join ? > > What is the advantage of co group ? > > > > Regards, > > Kashif > >