In last 4-5 of day the task tracker on one of my slave machines has gone
down couple of time. It has been working fine from the past 4-5 months

The cluster configuration is
4 machine cluster on AWS
1 m2.xlarge master
3 m2.xlarge slaves

The cluster is dedicated to run hive queries, with the data residing on s3.

the slave on which the task tracker went down had the following log

*******************************************************************
2013-06-11 00:26:30,968 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005693_0, duration: 279198
2013-06-11 00:26:30,971 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 193135
2013-06-11 00:26:30,971 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 192011
2013-06-11 00:26:30,972 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005693_0, duration: 178209
2013-06-11 00:26:30,973 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005694_0, duration: 186452
2013-06-11 00:26:30,973 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005694_0, duration: 157360
2013-06-11 00:26:30,974 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 157555
2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
killed jvm_201306071409_0151_m_-435659475 but just removed
2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
it ran: 0
2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
Throwable in JVMRunner. Aborting TaskTracker.
org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at java.io.BufferedWriter.close(BufferedWriter.java:265)
at java.io.PrintWriter.close(PrintWriter.java:312)
at
org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
at
org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
Caused by: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:297)
at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
... 13 more
2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
2013-06-11 00:26:31,008 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005694_0, duration: 222430
2013-06-11 00:26:31,008 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005693_0, duration: 154027
2013-06-11 00:26:31,008 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 132067
2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
Runner jvm_201306071409_0151_m_-495709221 spawned.
2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
Writing commands to
/mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
2013-06-11 00:26:31,331 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
attempt_201306071409_0151_m_005700_0, duration: 437236
2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
************************************************************/

-- 
RAVI SHETYE

Reply via email to