I am going to try few things today. I have a JAXBContext object that
marshals the xml, this is static instance but my guess at this point is
that since this is in separate jar then the one where job runs and I used
DistributeCache.addClassPath this context is being created on every call
for some
I think I've found the problem. There was one line of code that caused this
issue :) that was output.collect(key, value);
I had to add more logging to the code to get to it. For some reason kill
-QUIT didn't send the stacktrace to the userLogs/job/attempt/syslog , I
searched all the logs and
I am comparing runtime of similar logic. The entire logic is exactly same
but surprisingly map reduce job that I submit is 100x slow. For pig I use
udf and for hadoop I use mapper only and the logic same as pig. Even the
splits on the admin page are same. Not sure why it's so slow. I am
submitting
It would be great if we can take a look at what you are doing in the UDF vs
the Mapper.
100x slow does not make sense for the same job/logic, its either the Mapper
code or may be the cluster was busy at the time you scheduled MapReduce job?
Thanks,
Prashant
On Tue, Feb 28, 2012 at 4:11 PM,