Yeah sounds like a lot to dump if it takes 15 minutes to run. That alone can
take long time.
I once forgot to comment out some debug line in my udf. When run with
production data, not only it's slow, it blew up the cluster - simply run out of
log space :)
On Jun 17, 2011, at 5:06 PM, Jonatha
A couple of possibilities that I'm kicking around off the top of my head...
1) Does your MR job also sort afterwards? That's going to kick off another
MR job
2) Does your MR job compile all the results into one job?
My guess is the Order+Dump are making it take longer.
2011/6/17 Sujee Maniyam
I have log files like this:
#timestamp (ms), server,user,action,domain , x,y ,
z
126233288, 7, 50817, 2, yahoo.com, 31, blahblah, foobar
1262332800017, 2, 373168, 0, google.com, 67, blahblah, foobar
1262332800025, 8, 172910, 1, facebook.com, 135, blahblah, foobar