as expected, its failing during shuffle it seems like hdfs could not resolve the DNS name for slave nodes
have your configured your slaves host names correctly? 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201312311107_0003_r_000000_0' 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/ 127.0.0.1:52677' 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201312311107_0003_r_000000_0' 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/ 127.0.0.1:57492' On Tue, Dec 31, 2013 at 4:42 PM, navaz <navaz....@gmail.com> wrote: > Hi > > My hdfs-site is configured for 4 nodes. ( One is master and 3 slaves) > > <property> > <name>dfs.replication</name> > <value>4</value> > > start-dfs.sh and stop-mapred.sh doesnt solve the problem. > > Also tried to run the program after formatting the namenode(Master) which > also fails. > > My jobtracker logs on the master ( name node) is give below. > > > > 2013-12-31 14:27:35,534 INFO org.apache.hadoop.mapred.JobInProgress: > job_201312311107_0004: nMaps=3 nReduces=1 max=-1 > 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: Job > job_201312311107_0004 added successfully for user 'hduser' to queue > 'default' > 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.AuditLogger: > USER=hduser IP=155.98.39.28 OPERATION=SUBMIT_JOB TARGET=job_201312 > 311107_0004 RESULT=SUCCESS > 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: > Initializing job_201312311107_0004 > 2013-12-31 14:27:35,595 INFO org.apache.hadoop.mapred.JobInProgress: > Initializing job_201312311107_0004 > 2013-12-31 14:27:35,785 INFO org.apache.hadoop.mapred.JobInProgress: > jobToken generated and stored with users keys in /app/hadoop/tmp/map > red/system/job_201312311107_0004/jobToken > 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: Input > size for job job_201312311107_0004 = 3671523. Number of splits > = 3 > 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ > master > 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ > slave2 > 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ > slave1 > 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ > slave3 > 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ > master > 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ > slave1 > 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ > slave3 > 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ > slave2 > 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ > master > 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ > slave1 > 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ > slave2 > 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ > slave3 > 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: > job_201312311107_0004 LOCALITY_WAIT_FACTOR=1.0 > 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: Job > job_201312311107_0004 initialized successfully with 3 map tasks > and 1 reduce tasks. > 2013-12-31 14:27:35,913 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (JOB_SETUP) 'attempt_201312311107_0004_m_000004_0' to tip t > ask_201312311107_0004_m_000004, for tracker 'tracker_slave1:localhost/ > 127.0.0.1:57492' > 2013-12-31 14:27:40,876 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_201312311107_0004_m_000004_0' has completed task_20131 > 2311107_0004_m_000004 successfully. > 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (MAP) 'attempt_201312311107_0004_m_000000_0' to tip task_20 > 1312311107_0004_m_000000, for tracker 'tracker_slave1:localhost/ > 127.0.0.1:57492' > 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_201312311107_0004_m_000000 > 2013-12-31 14:27:40,907 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (MAP) 'attempt_201312311107_0004_m_000001_0' to tip task_20 > 1312311107_0004_m_000001, for tracker 'tracker_slave2:localhost/ > 127.0.0.1:52677' > 2013-12-31 14:27:40,908 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_201312311107_0004_m_000001 > 2013-12-31 14:27:41,122 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (MAP) 'attempt_201312311107_0004_m_000002_0' to tip task_20 > 1312311107_0004_m_000002, for tracker 'tracker_slave3:localhost/ > 127.0.0.1:46845' > 2013-12-31 14:27:41,123 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_201312311107_0004_m_000002 > 2013-12-31 14:27:49,659 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_201312311107_0004_m_000002_0' has completed task_20131 > 2311107_0004_m_000002 successfully. > 2013-12-31 14:27:49,662 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (REDUCE) 'attempt_201312311107_0004_r_000000_0' to tip task > _201312311107_0004_r_000000, for tracker 'tracker_slave3:localhost/ > 127.0.0.1:46845' > 2013-12-31 14:27:50,338 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_201312311107_0004_m_000000_0' has completed task_20131 > 2311107_0004_m_000000 successfully. > 2013-12-31 14:27:51,168 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_201312311107_0004_m_000001_0' has completed task_20131 > 2311107_0004_m_000001 successfully. > 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: > Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc > eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: Removing > task 'attempt_201312311107_0003_r_000000_0' > 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti > p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/ > 127.0.0.1:52677' > 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: Removing > task 'attempt_201312311107_0003_r_000000_0' > 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task > _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/ > 127.0.0.1:57492' > hduser@pc228:/usr/local/hadoop/logs$ > > > I am referring the below document to configure hadoop cluster. > > > http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ > > Did i miss something ? Pls guide. > > Thanks > Navaz > > > On Tue, Dec 31, 2013 at 3:25 PM, Hardik Pandya <smarty.ju...@gmail.com>wrote: > >> what does your job log says? is yout hdfs-site configured properly to >> find 3 data nodes? this could very well getting stuck in shuffle phase >> >> last thing to try : does stop-all and start-all helps? even worse try >> formatting namenode >> >> >> On Tue, Dec 31, 2013 at 11:40 AM, navaz <navaz....@gmail.com> wrote: >> >>> Hi >>> >>> >>> I am running Hadoop cluster with 1 name node and 3 data nodes. >>> >>> My HDFS looks like this. >>> >>> hduser@nm:/usr/local/hadoop$ hadoop fs -ls /user/hduser/getty/gutenberg >>> Warning: $HADOOP_HOME is deprecated. >>> >>> Found 7 items >>> -rw-r--r-- 4 hduser supergroup 343691 2013-12-30 19:12 >>> /user/hduser/getty/gutenberg/pg132.txt >>> -rw-r--r-- 4 hduser supergroup 594933 2013-12-30 19:12 >>> /user/hduser/getty/gutenberg/pg1661.txt >>> -rw-r--r-- 4 hduser supergroup 1945886 2013-12-30 19:12 >>> /user/hduser/getty/gutenberg/pg19699.txt >>> -rw-r--r-- 4 hduser supergroup 674570 2013-12-30 19:12 >>> /user/hduser/getty/gutenberg/pg20417.txt >>> -rw-r--r-- 4 hduser supergroup 1573150 2013-12-30 19:12 >>> /user/hduser/getty/gutenberg/pg4300.txt >>> -rw-r--r-- 4 hduser supergroup 1423803 2013-12-30 19:12 >>> /user/hduser/getty/gutenberg/pg5000.txt >>> -rw-r--r-- 4 hduser supergroup 393968 2013-12-30 19:12 >>> /user/hduser/getty/gutenberg/pg972.txt >>> hduser@nm:/usr/local/hadoop$ >>> >>> When i start mapreduce wordcount program it gives 100% mapping and >>> reduce is hangs at 14%. >>> >>> hduser@nm:~$ hadoop jar chiu-wordcount2.jar WordCount >>> /user/hduser/getty/gutenberg /user/hduser/getty/gutenberg_out3 >>> Warning: $HADOOP_HOME is deprecated. >>> >>> 13/12/31 09:31:07 WARN mapred.JobClient: Use GenericOptionsParser for >>> parsing the arguments. Applications should implement Tool for the same. >>> 13/12/31 09:31:07 INFO input.FileInputFormat: Total input paths to >>> process : 7 >>> 13/12/31 09:31:08 INFO util.NativeCodeLoader: Loaded the native-hadoop >>> library >>> 13/12/31 09:31:08 WARN snappy.LoadSnappy: Snappy native library not >>> loaded >>> 13/12/31 09:31:08 INFO mapred.JobClient: Running job: >>> job_201312310929_0001 >>> 13/12/31 09:31:09 INFO mapred.JobClient: map 0% reduce 0% >>> 13/12/31 09:31:29 INFO mapred.JobClient: map 14% reduce 0% >>> 13/12/31 09:31:34 INFO mapred.JobClient: map 32% reduce 0% >>> 13/12/31 09:31:35 INFO mapred.JobClient: map 75% reduce 0% >>> 13/12/31 09:31:36 INFO mapred.JobClient: map 90% reduce 0% >>> 13/12/31 09:31:37 INFO mapred.JobClient: map 99% reduce 0% >>> 13/12/31 09:31:38 INFO mapred.JobClient: map 100% reduce 0% >>> 13/12/31 09:31:43 INFO mapred.JobClient: map 100% reduce 14% >>> >>> <HANGS HEAR> >>> >>> Could you please help me in resolving this issue. >>> >>> >>> Thanks & Regards >>> *Abdul Navaz* >>> >>> >>> >>> >> > > > -- > *Abdul Navaz* > *Masters in Network Communications* > *University of Houston* > *Houston, TX - 77204-4020* > *Ph - 281-685-0388 <281-685-0388>* > *fabdulna...@uh.edu* <fabdulna...@uh.edu> > >