do you have your hosnames properly configured in etc/hosts? have you tried 192.168.?.? instead of localhost 127.0.0.1
On Wed, Jan 1, 2014 at 11:33 AM, navaz <navaz....@gmail.com> wrote: > Thanks. But I wonder Why map succeeds 100% , How it resolve hostname ? > > Now reduce becomes 100% but bailing out slave2 and slave 3 . ( But Mappig > is succeded for these nodes). > > Does it looks for hostname only for reduce ? > > > 14/01/01 09:09:38 INFO mapred.JobClient: Running job: job_201401010908_0001 > 14/01/01 09:09:39 INFO mapred.JobClient: map 0% reduce 0% > 14/01/01 09:10:00 INFO mapred.JobClient: map 33% reduce 0% > 14/01/01 09:10:01 INFO mapred.JobClient: map 66% reduce 0% > 14/01/01 09:10:05 INFO mapred.JobClient: map 100% reduce 0% > 14/01/01 09:10:14 INFO mapred.JobClient: map 100% reduce 22% > 14/01/01 09:17:32 INFO mapred.JobClient: map 100% reduce 0% > 14/01/01 09:17:35 INFO mapred.JobClient: Task Id : > attempt_201401010908_0001_r_000000_0, Status : FAILED > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 14/01/01 09:17:46 INFO mapred.JobClient: map 100% reduce 11% > 14/01/01 09:17:50 INFO mapred.JobClient: map 100% reduce 22% > 14/01/01 09:25:06 INFO mapred.JobClient: map 100% reduce 0% > 14/01/01 09:25:10 INFO mapred.JobClient: Task Id : > attempt_201401010908_0001_r_000000_1, Status : FAILED > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 14/01/01 09:25:34 INFO mapred.JobClient: map 100% reduce 100% > 14/01/01 09:25:42 INFO mapred.JobClient: Job complete: > job_201401010908_0001 > 14/01/01 09:25:42 INFO mapred.JobClient: Counters: 29 > > > > Job Tracker logs: > 2014-01-01 09:09:59,874 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_201401010908_0001_m_000002_0' has completed task_20140 > 1010908_0001_m_000002 successfully. > 2014-01-01 09:10:04,231 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_201401010908_0001_m_000001_0' has completed task_20140 > 1010908_0001_m_000001 successfully. > 2014-01-01 09:17:30,527 INFO org.apache.hadoop.mapred.TaskInProgress: > Error from attempt_201401010908_0001_r_000000_0: Shuffle Error: Exc > eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 2014-01-01 09:17:30,528 INFO org.apache.hadoop.mapred.JobTracker: Removing > task 'attempt_201401010908_0001_r_000000_0' > 2014-01-01 09:17:30,529 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (TASK_CLEANUP) 'attempt_201401010908_0001_r_000000_0' to ti > p task_201401010908_0001_r_000000, for tracker 'tracker_slave3:localhost/ > 127.0.0.1:44663' > 2014-01-01 09:17:35,130 INFO org.apache.hadoop.mapred.JobTracker: Removing > task 'attempt_201401010908_0001_r_000000_0' > 2014-01-01 09:17:35,213 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (REDUCE) 'attempt_201401010908_0001_r_000000_1' to tip task > _201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/ > 127.0.0.1:51438' > 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.TaskInProgress: > Error from attempt_201401010908_0001_r_000000_1: Shuffle Error: Exc > eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.JobTracker: Removing > task 'attempt_201401010908_0001_r_000000_1' > 2014-01-01 09:25:05,494 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (TASK_CLEANUP) 'attempt_201401010908_0001_r_000000_1' to ti > p task_201401010908_0001_r_000000, for tracker 'tracker_slave2:localhost/ > 127.0.0.1:51438' > 2014-01-01 09:25:10,087 INFO org.apache.hadoop.mapred.JobTracker: Removing > task 'attempt_201401010908_0001_r_000000_1' > 2014-01-01 09:25:10,109 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (REDUCE) 'attempt_201401010908_0001_r_000000_2' to tip task > _201401010908_0001_r_000000, for tracker 'tracker_master:localhost/ > 127.0.0.1:57156' > 2014-01-01 09:25:33,340 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_201401010908_0001_r_000000_2' has completed task_20140 > 1010908_0001_r_000000 successfully. > 2014-01-01 09:25:33,462 INFO org.apache.hadoop.mapred.JobTracker: Adding > task (JOB_CLEANUP) 'attempt_201401010908_0001_m_000003_0' to tip > task_201401010908_0001_m_000003, for tracker 'tracker_master:localhost/ > 127.0.0.1:57156' > 2014-01-01 09:25:42,304 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_201401010908_0001_m_000003_0' has completed task_20140 > 1010908_0001_m_000003 successfully. > > > On Tue, Dec 31, 2013 at 4:56 PM, Hardik Pandya <smarty.ju...@gmail.com>wrote: > >> as expected, its failing during shuffle >> >> it seems like hdfs could not resolve the DNS name for slave nodes >> >> have your configured your slaves host names correctly? >> >> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: >> Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc >> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: >> Removing task 'attempt_201312311107_0003_r_000000_0' >> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti >> p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/ >> 127.0.0.1:52677' >> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: >> Removing task 'attempt_201312311107_0003_r_000000_0' >> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding >> task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task >> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/ >> 127.0.0.1:57492' >> >> >> >> >> On Tue, Dec 31, 2013 at 4:42 PM, navaz <navaz....@gmail.com> wrote: >> >>> Hi >>> >>> My hdfs-site is configured for 4 nodes. ( One is master and 3 slaves) >>> >>> <property> >>> <name>dfs.replication</name> >>> <value>4</value> >>> >>> start-dfs.sh and stop-mapred.sh doesnt solve the problem. >>> >>> Also tried to run the program after formatting the namenode(Master) >>> which also fails. >>> >>> My jobtracker logs on the master ( name node) is give below. >>> >>> >>> >>> 2013-12-31 14:27:35,534 INFO org.apache.hadoop.mapred.JobInProgress: >>> job_201312311107_0004: nMaps=3 nReduces=1 max=-1 >>> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: Job >>> job_201312311107_0004 added successfully for user 'hduser' to queue >>> 'default' >>> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.AuditLogger: >>> USER=hduser IP=155.98.39.28 OPERATION=SUBMIT_JOB TARGET=job_201312 >>> 311107_0004 RESULT=SUCCESS >>> 2013-12-31 14:27:35,594 INFO org.apache.hadoop.mapred.JobTracker: >>> Initializing job_201312311107_0004 >>> 2013-12-31 14:27:35,595 INFO org.apache.hadoop.mapred.JobInProgress: >>> Initializing job_201312311107_0004 >>> 2013-12-31 14:27:35,785 INFO org.apache.hadoop.mapred.JobInProgress: >>> jobToken generated and stored with users keys in /app/hadoop/tmp/map >>> red/system/job_201312311107_0004/jobToken >>> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: >>> Input size for job job_201312311107_0004 = 3671523. Number of splits >>> = 3 >>> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >>> master >>> 2013-12-31 14:27:35,795 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >>> slave2 >>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >>> slave1 >>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000000 has split on node:/default-rack/ >>> slave3 >>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >>> master >>> 2013-12-31 14:27:35,796 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >>> slave1 >>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >>> slave3 >>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000001 has split on node:/default-rack/ >>> slave2 >>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >>> master >>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >>> slave1 >>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >>> slave2 >>> 2013-12-31 14:27:35,797 INFO org.apache.hadoop.mapred.JobInProgress: >>> tip:task_201312311107_0004_m_000002 has split on node:/default-rack/ >>> slave3 >>> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: >>> job_201312311107_0004 LOCALITY_WAIT_FACTOR=1.0 >>> 2013-12-31 14:27:35,798 INFO org.apache.hadoop.mapred.JobInProgress: Job >>> job_201312311107_0004 initialized successfully with 3 map tasks >>> and 1 reduce tasks. >>> 2013-12-31 14:27:35,913 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (JOB_SETUP) 'attempt_201312311107_0004_m_000004_0' to tip t >>> ask_201312311107_0004_m_000004, for tracker 'tracker_slave1:localhost/ >>> 127.0.0.1:57492' >>> 2013-12-31 14:27:40,876 INFO org.apache.hadoop.mapred.JobInProgress: >>> Task 'attempt_201312311107_0004_m_000004_0' has completed task_20131 >>> 2311107_0004_m_000004 successfully. >>> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (MAP) 'attempt_201312311107_0004_m_000000_0' to tip task_20 >>> 1312311107_0004_m_000000, for tracker 'tracker_slave1:localhost/ >>> 127.0.0.1:57492' >>> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobInProgress: >>> Choosing data-local task task_201312311107_0004_m_000000 >>> 2013-12-31 14:27:40,907 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (MAP) 'attempt_201312311107_0004_m_000001_0' to tip task_20 >>> 1312311107_0004_m_000001, for tracker 'tracker_slave2:localhost/ >>> 127.0.0.1:52677' >>> 2013-12-31 14:27:40,908 INFO org.apache.hadoop.mapred.JobInProgress: >>> Choosing data-local task task_201312311107_0004_m_000001 >>> 2013-12-31 14:27:41,122 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (MAP) 'attempt_201312311107_0004_m_000002_0' to tip task_20 >>> 1312311107_0004_m_000002, for tracker 'tracker_slave3:localhost/ >>> 127.0.0.1:46845' >>> 2013-12-31 14:27:41,123 INFO org.apache.hadoop.mapred.JobInProgress: >>> Choosing data-local task task_201312311107_0004_m_000002 >>> 2013-12-31 14:27:49,659 INFO org.apache.hadoop.mapred.JobInProgress: >>> Task 'attempt_201312311107_0004_m_000002_0' has completed task_20131 >>> 2311107_0004_m_000002 successfully. >>> 2013-12-31 14:27:49,662 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (REDUCE) 'attempt_201312311107_0004_r_000000_0' to tip task >>> _201312311107_0004_r_000000, for tracker 'tracker_slave3:localhost/ >>> 127.0.0.1:46845' >>> 2013-12-31 14:27:50,338 INFO org.apache.hadoop.mapred.JobInProgress: >>> Task 'attempt_201312311107_0004_m_000000_0' has completed task_20131 >>> 2311107_0004_m_000000 successfully. >>> 2013-12-31 14:27:51,168 INFO org.apache.hadoop.mapred.JobInProgress: >>> Task 'attempt_201312311107_0004_m_000001_0' has completed task_20131 >>> 2311107_0004_m_000001 successfully. >>> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: >>> Error from attempt_201312311107_0003_r_000000_0: Shuffle Error: Exc >>> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. >>> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: >>> Removing task 'attempt_201312311107_0003_r_000000_0' >>> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (TASK_CLEANUP) 'attempt_201312311107_0003_r_000000_0' to ti >>> p task_201312311107_0003_r_000000, for tracker 'tracker_slave2:localhost/ >>> 127.0.0.1:52677' >>> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: >>> Removing task 'attempt_201312311107_0003_r_000000_0' >>> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (REDUCE) 'attempt_201312311107_0003_r_000000_1' to tip task >>> _201312311107_0003_r_000000, for tracker 'tracker_slave1:localhost/ >>> 127.0.0.1:57492' >>> hduser@pc228:/usr/local/hadoop/logs$ >>> >>> >>> I am referring the below document to configure hadoop cluster. >>> >>> >>> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ >>> >>> Did i miss something ? Pls guide. >>> >>> Thanks >>> Navaz >>> >>> >>> On Tue, Dec 31, 2013 at 3:25 PM, Hardik Pandya >>> <smarty.ju...@gmail.com>wrote: >>> >>>> what does your job log says? is yout hdfs-site configured properly to >>>> find 3 data nodes? this could very well getting stuck in shuffle phase >>>> >>>> last thing to try : does stop-all and start-all helps? even worse try >>>> formatting namenode >>>> >>>> >>>> On Tue, Dec 31, 2013 at 11:40 AM, navaz <navaz....@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> >>>>> I am running Hadoop cluster with 1 name node and 3 data nodes. >>>>> >>>>> My HDFS looks like this. >>>>> >>>>> hduser@nm:/usr/local/hadoop$ hadoop fs -ls >>>>> /user/hduser/getty/gutenberg >>>>> Warning: $HADOOP_HOME is deprecated. >>>>> >>>>> Found 7 items >>>>> -rw-r--r-- 4 hduser supergroup 343691 2013-12-30 19:12 >>>>> /user/hduser/getty/gutenberg/pg132.txt >>>>> -rw-r--r-- 4 hduser supergroup 594933 2013-12-30 19:12 >>>>> /user/hduser/getty/gutenberg/pg1661.txt >>>>> -rw-r--r-- 4 hduser supergroup 1945886 2013-12-30 19:12 >>>>> /user/hduser/getty/gutenberg/pg19699.txt >>>>> -rw-r--r-- 4 hduser supergroup 674570 2013-12-30 19:12 >>>>> /user/hduser/getty/gutenberg/pg20417.txt >>>>> -rw-r--r-- 4 hduser supergroup 1573150 2013-12-30 19:12 >>>>> /user/hduser/getty/gutenberg/pg4300.txt >>>>> -rw-r--r-- 4 hduser supergroup 1423803 2013-12-30 19:12 >>>>> /user/hduser/getty/gutenberg/pg5000.txt >>>>> -rw-r--r-- 4 hduser supergroup 393968 2013-12-30 19:12 >>>>> /user/hduser/getty/gutenberg/pg972.txt >>>>> hduser@nm:/usr/local/hadoop$ >>>>> >>>>> When i start mapreduce wordcount program it gives 100% mapping and >>>>> reduce is hangs at 14%. >>>>> >>>>> hduser@nm:~$ hadoop jar chiu-wordcount2.jar WordCount >>>>> /user/hduser/getty/gutenberg /user/hduser/getty/gutenberg_out3 >>>>> Warning: $HADOOP_HOME is deprecated. >>>>> >>>>> 13/12/31 09:31:07 WARN mapred.JobClient: Use GenericOptionsParser for >>>>> parsing the arguments. Applications should implement Tool for the same. >>>>> 13/12/31 09:31:07 INFO input.FileInputFormat: Total input paths to >>>>> process : 7 >>>>> 13/12/31 09:31:08 INFO util.NativeCodeLoader: Loaded the native-hadoop >>>>> library >>>>> 13/12/31 09:31:08 WARN snappy.LoadSnappy: Snappy native library not >>>>> loaded >>>>> 13/12/31 09:31:08 INFO mapred.JobClient: Running job: >>>>> job_201312310929_0001 >>>>> 13/12/31 09:31:09 INFO mapred.JobClient: map 0% reduce 0% >>>>> 13/12/31 09:31:29 INFO mapred.JobClient: map 14% reduce 0% >>>>> 13/12/31 09:31:34 INFO mapred.JobClient: map 32% reduce 0% >>>>> 13/12/31 09:31:35 INFO mapred.JobClient: map 75% reduce 0% >>>>> 13/12/31 09:31:36 INFO mapred.JobClient: map 90% reduce 0% >>>>> 13/12/31 09:31:37 INFO mapred.JobClient: map 99% reduce 0% >>>>> 13/12/31 09:31:38 INFO mapred.JobClient: map 100% reduce 0% >>>>> 13/12/31 09:31:43 INFO mapred.JobClient: map 100% reduce 14% >>>>> >>>>> <HANGS HEAR> >>>>> >>>>> Could you please help me in resolving this issue. >>>>> >>>>> >>>>> Thanks & Regards >>>>> *Abdul Navaz* >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> *Abdul Navaz* >>> *Masters in Network Communications* >>> *University of Houston* >>> *Houston, TX - 77204-4020* >>> *Ph - 281-685-0388 <281-685-0388>* >>> *fabdulna...@uh.edu* <fabdulna...@uh.edu> >>> >>> >> > > > -- > *Abdul Navaz* > *Masters in Network Communications* > *University of Houston* > *Houston, TX - 77204-4020* > *Ph - 281-685-0388 <281-685-0388>* > *fabdulna...@uh.edu* <fabdulna...@uh.edu> > >