Generally the Mapreduce jobs take some to get set up and distributed. Did you account for that time?
Thanks On Jul 24, 2014 12:18 PM, "Rodrigo Ferreira" <[email protected]> wrote: > You are right, Paul. No doubt about that. Unfortunately, the project I'm > involved in is closely related to Pig so I have to get the best from it. > > Pig is great, don't get me wrong. I'm just trying to understand if there's > still something that can be done to tune its performance or if this is the > best I can get. > > Thanks, > Rodrigo. > > > 2014-07-24 18:06 GMT+02:00 Paul Houle <[email protected]>: > > > I don't think anybody uses Pig because it is efficient use of a > > computer cluster. Instead people use it because it is an efficient > > use of their time. > > > > If you're getting to the point where CPU performance matters you can > > generally write a plain Hadoop job that is faster, particularly if > > you think a lot about the algorithms and data structures. > > ᐧ > > > > On Thu, Jul 24, 2014 at 9:11 AM, Rodrigo Ferreira <[email protected]> > > wrote: > > > Hi everyone, > > > > > > I have a question for you guys. > > > > > > Well, I've started doing some experiments with the UDFs that I've > > created. > > > And at this point I'm interested in assessing their performance. > > > > > > I have something like: > > > > > > A = LOAD ... using JsonLoader(); > > > > > > B = FOREACH A GENERATE MyUDF(); > > > > > > This code, that is translated into a single Map task (no reduce) takes > > 1:20 > > > to execute. If I comment the projection and just load the data it takes > > 27 > > > seconds. So the first assumption is that the rest of the time was spent > > in > > > MyUDF right? Not quite. > > > > > > I printed (using System.nanoTime()) all the calls to exec() and they > > don't > > > sum up more than 5 seconds. So where have the other 48 seconds gone? > > > > > > The output of my UDF is a bag. Basically for each input tuple I > "create" > > > several output tuples and put them in a bag. > > > > > > Thanks, > > > > > > Rodrigo Ferreira. > > > > > > > > -- > > Paul Houle > > Expert on Freebase, DBpedia, Hadoop and RDF > > (607) 539 6254 paul.houle on Skype [email protected] > > >
