Re: 100x slower mapreduce compared to pig

2012-02-28 Thread Prashant Kommireddi
It would be great if we can take a look at what you are doing in the UDF vs the Mapper. 100x slow does not make sense for the same job/logic, its either the Mapper code or may be the cluster was busy at the time you scheduled MapReduce job? Thanks, Prashant On Tue, Feb 28, 2012 at 4:11 PM, Mohit

Re: 100x slower mapreduce compared to pig

2012-02-29 Thread Mohit Anchlia
I am going to try few things today. I have a JAXBContext object that marshals the xml, this is static instance but my guess at this point is that since this is in separate jar then the one where job runs and I used DistributeCache.addClassPath this context is being created on every call for some re

Re: 100x slower mapreduce compared to pig

2012-02-29 Thread Mohit Anchlia
I can't seem to find what's causing this slowness. Nothing in the logs. It's just painfuly slow. However, pig job is awesome in performance that has the same logic. Here is the mapper code and the pig code: *public* *static* *class* Map *extends* MapReduceBase *implements* Mapper { *public* *vo

Re: 100x slower mapreduce compared to pig

2012-02-29 Thread Mohit Anchlia
I think I've found the problem. There was one line of code that caused this issue :) that was output.collect(key, value); I had to add more logging to the code to get to it. For some reason kill -QUIT didn't send the stacktrace to the userLogs///syslog , I searched all the logs and couldn't find