You are right, Paul. No doubt about that. Unfortunately, the project I'm involved in is closely related to Pig so I have to get the best from it.
Pig is great, don't get me wrong. I'm just trying to understand if there's still something that can be done to tune its performance or if this is the best I can get. Thanks, Rodrigo. 2014-07-24 18:06 GMT+02:00 Paul Houle <[email protected]>: > I don't think anybody uses Pig because it is efficient use of a > computer cluster. Instead people use it because it is an efficient > use of their time. > > If you're getting to the point where CPU performance matters you can > generally write a plain Hadoop job that is faster, particularly if > you think a lot about the algorithms and data structures. > ᐧ > > On Thu, Jul 24, 2014 at 9:11 AM, Rodrigo Ferreira <[email protected]> > wrote: > > Hi everyone, > > > > I have a question for you guys. > > > > Well, I've started doing some experiments with the UDFs that I've > created. > > And at this point I'm interested in assessing their performance. > > > > I have something like: > > > > A = LOAD ... using JsonLoader(); > > > > B = FOREACH A GENERATE MyUDF(); > > > > This code, that is translated into a single Map task (no reduce) takes > 1:20 > > to execute. If I comment the projection and just load the data it takes > 27 > > seconds. So the first assumption is that the rest of the time was spent > in > > MyUDF right? Not quite. > > > > I printed (using System.nanoTime()) all the calls to exec() and they > don't > > sum up more than 5 seconds. So where have the other 48 seconds gone? > > > > The output of my UDF is a bag. Basically for each input tuple I "create" > > several output tuples and put them in a bag. > > > > Thanks, > > > > Rodrigo Ferreira. > > > > -- > Paul Houle > Expert on Freebase, DBpedia, Hadoop and RDF > (607) 539 6254 paul.houle on Skype [email protected] >
