I don't remember the code that well already to give you details, but a lot of jobs are actually reduce bound.
Sent from my phone. On Oct 9, 2014 11:07 AM, "Yang" <teddyyyy...@gmail.com> wrote: > my Q-Job MR job shows as 100% mapper complete (it's a map-only job) very > quickly, but the job itself does not finish, until about 10 minutes later. > this is rather surprising. my input is a sparse vector of 37000 rows, and > the column count is 8000, with each row usually having < 10 elements set to > non-zero. so the input size is fairly small. > > > I looked at the Q-job code, it seems rather normal, i.e. it's not doing > anything special after the map() function is completed. so I wonder why > it's lagging so long after 100% ? > > > here is the syslog from hadoop: > > > > 2014-10-09 10:37:40,504 INFO [main] > org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & > initialized native-zlib library > 2014-10-09 10:37:40,538 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.gz] > 2014-10-09 10:37:40,548 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.gz] > 2014-10-09 10:37:40,548 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.gz] > 2014-10-09 10:37:40,549 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.gz] > 2014-10-09 10:39:39,143 WARN [communication thread] > org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Error reading the > stream java.io.IOException: No such process > 2014-10-09 10:40:09,117 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor > [.deflate] > 2014-10-09 10:46:23,991 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.deflate] > 2014-10-09 10:46:23,992 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.deflate] > 2014-10-09 10:46:23,992 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.deflate] > 2014-10-09 10:46:23,992 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor > [.deflate] > 2014-10-09 10:46:31,219 INFO > [LeaseRenewer:yyan...@apollo-phx-nn.vip.ebay.com:8020] > org.apache.hadoop.ipc.Client: Retrying connect to server: > apollo-phx-nn.vip.ebay.com/10.115.201.75:8020. Already tried 0 > time(s); maxRetries=45 > 2014-10-09 10:47:45,241 INFO [main] > org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor > [.deflate] > 2014-10-09 10:47:46,571 INFO [main] org.apache.hadoop.mapred.Task: > Task:attempt_1412781120464_7857_m_000000_0 is done. And is in the > process of committing > 2014-10-09 10:47:46,739 INFO [main] org.apache.hadoop.mapred.Task: > Task attempt_1412781120464_7857_m_000000_0 is allowed to commit now > 2014-10-09 10:47:47,389 INFO [main] > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved > output of task 'attempt_1412781120464_7857_m_000000_0' to > hdfs:// > apollo-phx-nn.vip.ebay.com:8020/user/yyang15/CIReco/shoes/ssvd/tmp/ssvd/Q-job/_temporary/1/task_1412781120464_7857_m_000000 > 2014-10-09 10:47:47,574 INFO [main] org.apache.hadoop.mapred.Task: > Task 'attempt_1412781120464_7857_m_000000_0' done. > 2014-10-09 10:47:47,575 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask > metrics system... > 2014-10-09 10:47:47,576 INFO [ganglia] > org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: ganglia thread > interrupted. > 2014-10-09 10:47:47,576 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics > system stopped. > 2014-10-09 10:47:47,576 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics > system shutdown complete. >