Pigs don't fly

Rodrigo Ferreira Thu, 24 Jul 2014 06:12:52 -0700

Hi everyone,

I have a question for you guys.


Well, I've started doing some experiments with the UDFs that I've created.
And at this point I'm interested in assessing their performance.

I have something like:

A = LOAD ... using JsonLoader();

B = FOREACH A GENERATE MyUDF();

This code, that is translated into a single Map task (no reduce) takes 1:20
to execute. If I comment the projection and just load the data it takes 27
seconds. So the first assumption is that the rest of the time was spent in
MyUDF right? Not quite.

I printed (using System.nanoTime()) all the calls to exec() and they don't
sum up more than 5 seconds. So where have the other 48 seconds gone?

The output of my UDF is a bag. Basically for each input tuple I "create"
several output tuples and put them in a bag.

Thanks,

Rodrigo Ferreira.

Pigs don't fly

Reply via email to