Re: Pigs don't fly

Paul Houle Thu, 24 Jul 2014 09:07:30 -0700

I don't think anybody uses Pig because it is efficient use of a
computer cluster.  Instead people use it because it is an efficient
use of their time.


If you're getting to the point where CPU performance matters you can
generally write a plain Hadoop job that is faster,  particularly if
you think a lot about the algorithms and data structures.
ᐧ

On Thu, Jul 24, 2014 at 9:11 AM, Rodrigo Ferreira <[email protected]> wrote:
> Hi everyone,
>
> I have a question for you guys.
>
> Well, I've started doing some experiments with the UDFs that I've created.
> And at this point I'm interested in assessing their performance.
>
> I have something like:
>
> A = LOAD ... using JsonLoader();
>
> B = FOREACH A GENERATE MyUDF();
>
> This code, that is translated into a single Map task (no reduce) takes 1:20
> to execute. If I comment the projection and just load the data it takes 27
> seconds. So the first assumption is that the rest of the time was spent in
> MyUDF right? Not quite.
>
> I printed (using System.nanoTime()) all the calls to exec() and they don't
> sum up more than 5 seconds. So where have the other 48 seconds gone?
>
> The output of my UDF is a bag. Basically for each input tuple I "create"
> several output tuples and put them in a bag.
>
> Thanks,
>
> Rodrigo Ferreira.



-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   [email protected]

Re: Pigs don't fly

Reply via email to