Have you checked the logs? Is there a task that is taking a long time? What is that task doing?
There are two basic possibilities: a) you have a skewed join like the other Ted mentioned. In this case, the straggler will be seen to be working on data. b) you have a hung process. This can be more difficult to diagnose, but indicates that there is a problem with your cluster. On Fri, Apr 26, 2013 at 2:21 AM, Han JU <ju.han.fe...@gmail.com> wrote: > Hi, > > I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My > questionis that in one of the jobs, map and reduce tasks show 100% finished > in about 1m 30s, but I have to wait another 5m for this job to finish. > This job writes about 720mb compressed data to HDFS with replication > factor 1, in sequence file format. I've tried copying these data to hdfs, > it takes only < 20 seconds. What happened during this 5 more minutes? > > Any idea on how to optimize this part? > > Thanks. > > -- > *JU Han* > > UTC - Université de Technologie de Compiègne > * **GI06 - Fouille de Données et Décisionnel* > > +33 0619608888 >