Filters can be pushed above co-group, the same as they can above join, if that’s what you’re asking.
The number of map reduce jobs depends. A cogroup will always result in one job. Some joins result in multiple jobs (skew joins), some in map only jobs (fragment-replicate). Alan. > On Sep 28, 2016, at 17:06, Kashif Hussain <kash.t...@gmail.com> wrote: > > Will a co group with filter be equivalent to join ? > I mean will pig optimize the former to achieve performance equivalent to > latter ? I assume that single map reduce job will be spawned in both cases. > > On Wed, Sep 28, 2016 at 11:14 PM, Alan Gates <alanfga...@gmail.com> wrote: > >> Cogroup is only the first half of join. It collects the records with the >> matching key together. It does not do the cross product of records with >> matching keys. >> >> If you are going to do a join (that is, you want to produce the matching >> records) join is usually better as there are a number of join optimizations >> available (skew join, fragment/replicate) which aren’t there for cogroup. >> But if you don’t need to actually instantiate the records, cogroup can be >> faster. For example, say you just wanted to count the number of matching >> records, then doing a cogroup and passing the resulting bags to COUNT would >> give you your answer. >> >> Alan. >> >>> On Sep 28, 2016, at 07:15, Kashif Hussain <kash.t...@gmail.com> wrote: >>> >>> Hi, >>> >>> I want to know in which cases co group can perform better than join ? >>> What is the advantage of co group ? >>> >>> Regards, >>> Kashif >> >>