Will a co group with filter be equivalent to join ?
I mean will pig optimize the former to achieve performance equivalent to
latter ? I assume that single map reduce job will be spawned in both cases.

On Wed, Sep 28, 2016 at 11:14 PM, Alan Gates <alanfga...@gmail.com> wrote:

> Cogroup is only the first half of join.  It collects the records with the
> matching key together.  It does not do the cross product of records with
> matching keys.
>
> If you are going to do a join (that is, you want to produce the matching
> records) join is usually better as there are a number of join optimizations
> available (skew join, fragment/replicate) which aren’t there for cogroup.
> But if you don’t need to actually instantiate the records, cogroup can be
> faster.  For example, say you just wanted to count the number of matching
> records, then doing a cogroup and passing the resulting bags to COUNT would
> give you your answer.
>
> Alan.
>
> > On Sep 28, 2016, at 07:15, Kashif Hussain <kash.t...@gmail.com> wrote:
> >
> > Hi,
> >
> > I want to know in which cases co group can perform better than join ?
> > What is the advantage of co group ?
> >
> > Regards,
> > Kashif
>
>

Reply via email to