my hadoop job failed sometimes

2011-08-16 Thread Jianxin Wang
hi, my job runs once ervey day. but it failed sometimes. i checked the log in job tracker. It seems a hdfs error? thanks a lot! 2011-08-16 21:07:13,247 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201106021431_1719_r_000498_1: org.ap ache.hadoop.ipc.RemoteException: java.io.IOE

Re: my hadoop job failed sometimes

2011-08-16 Thread Harsh J
Do you notice anything related in the NameNode logs? One reason for this is that the NameNode may be in safe mode for some reason, but there are many other reasons so the NameNode's log would be the best place to look for exactly why the complete()-op fails. On Wed, Aug 17, 2011 at 8:20 AM, Jianxi

Re: my hadoop job failed sometimes

2011-08-16 Thread Jianxin Wang
Thanks Harsh :) the hadoop system started 3 months ago. so i think it is not in safe mode. i found some old tasks started 10 days ago, the tasks seem blocked by some unknown reason. I killed these tasks now. but i don't know why a task can be blocked and exists so long. I found another type of exc

Re: my hadoop job failed sometimes

2011-08-22 Thread Jianxin Wang
hi,harsh, i wrote the namenode's log here. The problem occurs occasionally Thanks a lot! 2011-08-22 14:41:05,939 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=tpic,users,dialout ip=/172.28.1. 29 cmd=delete src=/walter/send_albums/110822_143455/_temporary dst=null perm=nul