Filters can be pushed above co-group, the same as they can above join, if 
that’s what you’re asking.

The number of map reduce jobs depends.  A cogroup will always result in one 
job.  Some joins result in multiple jobs (skew joins), some in map only jobs 
(fragment-replicate).

Alan.

> On Sep 28, 2016, at 17:06, Kashif Hussain <kash.t...@gmail.com> wrote:
> 
> Will a co group with filter be equivalent to join ?
> I mean will pig optimize the former to achieve performance equivalent to
> latter ? I assume that single map reduce job will be spawned in both cases.
> 
> On Wed, Sep 28, 2016 at 11:14 PM, Alan Gates <alanfga...@gmail.com> wrote:
> 
>> Cogroup is only the first half of join.  It collects the records with the
>> matching key together.  It does not do the cross product of records with
>> matching keys.
>> 
>> If you are going to do a join (that is, you want to produce the matching
>> records) join is usually better as there are a number of join optimizations
>> available (skew join, fragment/replicate) which aren’t there for cogroup.
>> But if you don’t need to actually instantiate the records, cogroup can be
>> faster.  For example, say you just wanted to count the number of matching
>> records, then doing a cogroup and passing the resulting bags to COUNT would
>> give you your answer.
>> 
>> Alan.
>> 
>>> On Sep 28, 2016, at 07:15, Kashif Hussain <kash.t...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I want to know in which cases co group can perform better than join ?
>>> What is the advantage of co group ?
>>> 
>>> Regards,
>>> Kashif
>> 
>> 

Reply via email to