Generally the Mapreduce jobs take some to get set up and distributed. Did
you account for that time?

Thanks
On Jul 24, 2014 12:18 PM, "Rodrigo Ferreira" <[email protected]> wrote:

> You are right, Paul. No doubt about that. Unfortunately, the project I'm
> involved in is closely related to Pig so I have to get the best from it.
>
> Pig is great, don't get me wrong. I'm just trying to understand if there's
> still something that can be done to tune its performance or if this is the
> best I can get.
>
> Thanks,
> Rodrigo.
>
>
> 2014-07-24 18:06 GMT+02:00 Paul Houle <[email protected]>:
>
> > I don't think anybody uses Pig because it is efficient use of a
> > computer cluster.  Instead people use it because it is an efficient
> > use of their time.
> >
> > If you're getting to the point where CPU performance matters you can
> > generally write a plain Hadoop job that is faster,  particularly if
> > you think a lot about the algorithms and data structures.
> > ᐧ
> >
> > On Thu, Jul 24, 2014 at 9:11 AM, Rodrigo Ferreira <[email protected]>
> > wrote:
> > > Hi everyone,
> > >
> > > I have a question for you guys.
> > >
> > > Well, I've started doing some experiments with the UDFs that I've
> > created.
> > > And at this point I'm interested in assessing their performance.
> > >
> > > I have something like:
> > >
> > > A = LOAD ... using JsonLoader();
> > >
> > > B = FOREACH A GENERATE MyUDF();
> > >
> > > This code, that is translated into a single Map task (no reduce) takes
> > 1:20
> > > to execute. If I comment the projection and just load the data it takes
> > 27
> > > seconds. So the first assumption is that the rest of the time was spent
> > in
> > > MyUDF right? Not quite.
> > >
> > > I printed (using System.nanoTime()) all the calls to exec() and they
> > don't
> > > sum up more than 5 seconds. So where have the other 48 seconds gone?
> > >
> > > The output of my UDF is a bag. Basically for each input tuple I
> "create"
> > > several output tuples and put them in a bag.
> > >
> > > Thanks,
> > >
> > > Rodrigo Ferreira.
> >
> >
> >
> > --
> > Paul Houle
> > Expert on Freebase, DBpedia, Hadoop and RDF
> > (607) 539 6254    paul.houle on Skype   [email protected]
> >
>

Reply via email to