If the per-record processing time is very high, you will need to periodically report a status. Without a status change report from the task to the tracker, it will be killed away as a dead task after a default timeout of 10 minutes (600s).
Also, beware of holding too much memory in a reduce JVM - you're still limited there. Best to let the framework do the sort or secondary sort. On Fri, Jan 11, 2013 at 10:58 AM, yaotian <yaot...@gmail.com> wrote: > Yes, you are right. The data is GPS trace related to corresponding uid. > The reduce is doing this: Sort user to get this kind of result: uid, gps1, > gps2, gps3........ > Yes, the gps data is big because this is 30G data. > > How to solve this? > > > > 2013/1/11 Mahesh Balija <balijamahesh....@gmail.com> > >> Hi, >> >> 2 reducers are successfully completed and 1498 have been >> killed. I assume that you have the data issues. (Either the data is huge or >> some issues with the data you are trying to process) >> One possibility could be you have many values associated to a >> single key, which can cause these kind of issues based on the operation you >> do in your reducer. >> Can you put some logs in your reducer and try to trace out what >> is happening. >> >> Best, >> Mahesh Balija, >> Calsoft Labs. >> >> >> On Fri, Jan 11, 2013 at 8:53 AM, yaotian <yaot...@gmail.com> wrote: >> >>> I have 1 hadoop master which name node locates and 2 slave which >>> datanode locate. >>> >>> If i choose a small data like 200M, it can be done. >>> >>> But if i run 30G data, Map is done. But the reduce report error. Any >>> sugggestion? >>> >>> >>> This is the information. >>> >>> *Black-listed TaskTrackers:* >>> 1<http://23.20.27.135:9003/jobblacklistedtrackers.jsp?jobid=job_201301090834_0041> >>> ------------------------------ >>> Kind % CompleteNum Tasks PendingRunningComplete KilledFailed/Killed >>> Task >>> Attempts<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041> >>> map<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1> >>> 100.00%4500 >>> 0450<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=map&pagenum=1&state=completed> >>> 00 / >>> 1<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=map&cause=killed> >>> reduce<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1> >>> 100.00%1500 0 >>> 02<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=completed> >>> 1498<http://23.20.27.135:9003/jobtasks.jsp?jobid=job_201301090834_0041&type=reduce&pagenum=1&state=killed> >>> 12<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=failed> >>> / >>> 3<http://23.20.27.135:9003/jobfailures.jsp?jobid=job_201301090834_0041&kind=reduce&cause=killed> >>> >>> >>> TaskCompleteStatusStart TimeFinish TimeErrorsCounters >>> task_201301090834_0041_r_000001<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000001> >>> 0.00% >>> 10-Jan-2013 04:18:54 >>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 44sec) >>> >>> Task attempt_201301090834_0041_r_000001_0 failed to report status for 600 >>> seconds. Killing! >>> Task attempt_201301090834_0041_r_000001_1 failed to report status for 602 >>> seconds. Killing! >>> Task attempt_201301090834_0041_r_000001_2 failed to report status for 602 >>> seconds. Killing! >>> Task attempt_201301090834_0041_r_000001_3 failed to report status for 602 >>> seconds. Killing! >>> >>> >>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000001> >>> task_201301090834_0041_r_000002<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000002> >>> 0.00% >>> 10-Jan-2013 04:18:54 >>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 43sec) >>> >>> Task attempt_201301090834_0041_r_000002_0 failed to report status for 601 >>> seconds. Killing! >>> Task attempt_201301090834_0041_r_000002_1 failed to report status for 600 >>> seconds. Killing! >>> >>> >>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000002> >>> task_201301090834_0041_r_000003<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000003> >>> 0.00% >>> 10-Jan-2013 04:18:57 >>> 10-Jan-2013 06:46:38 (2hrs, 27mins, 41sec) >>> >>> Task attempt_201301090834_0041_r_000003_0 failed to report status for 602 >>> seconds. Killing! >>> Task attempt_201301090834_0041_r_000003_1 failed to report status for 602 >>> seconds. Killing! >>> Task attempt_201301090834_0041_r_000003_2 failed to report status for 602 >>> seconds. Killing! >>> >>> >>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000003> >>> task_201301090834_0041_r_000005<http://23.20.27.135:9003/taskdetails.jsp?tipid=task_201301090834_0041_r_000005> >>> 0.00% >>> 10-Jan-2013 06:11:07 >>> 10-Jan-2013 06:46:38 (35mins, 31sec) >>> >>> >>> Task attempt_201301090834_0041_r_000005_0 failed to report status for 600 >>> seconds. Killing! >>> >>> >>> 0<http://23.20.27.135:9003/taskstats.jsp?tipid=task_201301090834_0041_r_000005> >>> >> >> > -- Harsh J