Re: pig script takes much longer than java MR job

2011-06-17 Thread Dexin Wang
Yeah sounds like a lot to dump if it takes 15 minutes to run. That alone can take long time. I once forgot to comment out some debug line in my udf. When run with production data, not only it's slow, it blew up the cluster - simply run out of log space :) On Jun 17, 2011, at 5:06 PM, Jonatha

Re: pig script takes much longer than java MR job

2011-06-17 Thread Jonathan Coveney
A couple of possibilities that I'm kicking around off the top of my head... 1) Does your MR job also sort afterwards? That's going to kick off another MR job 2) Does your MR job compile all the results into one job? My guess is the Order+Dump are making it take longer. 2011/6/17 Sujee Maniyam

pig script takes much longer than java MR job

2011-06-17 Thread Sujee Maniyam
I have log files like this: #timestamp (ms), server,user,action,domain , x,y , z 126233288, 7, 50817, 2, yahoo.com, 31, blahblah, foobar 1262332800017, 2, 373168, 0, google.com, 67, blahblah, foobar 1262332800025, 8, 172910, 1, facebook.com, 135, blahblah, foobar