[ https://issues.apache.org/jira/browse/MAPREDUCE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved MAPREDUCE-1296. ----------------------------------------- Resolution: Fixed Since "fixed". (New problem is that execution capacity should decrease due to less IO capability, but that's a different JIRA.) > Tasks fail after the first disk (/grid/0/) of all TTs reaches 100%, even > though other disks still have space. > ------------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-1296 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1296 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: capacity-sched > Affects Versions: 0.20.2 > Reporter: Iyappan Srinivasan > > Tasks fail after the first disk (/grid/0/) of all TTs reaches 100%, even > though other disks still have space. > In a cluster, data is distributed almost uniformly. Disk /grid/0/ reaches > 100% first, because of extra filling up of info like logs etc. After it > reaches 100% tasks starts to fail with the error, > java.lang.Throwable: Child Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:516) > Caused by: java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:503) > This happens even though the other disks are still at 80%, so still can be > filled up more. > Steps to reproduce: > 1) Bring up a cluster with Linux task controller. > 2) Start filling the dfs up with data using randomwriter or teragen. > 3) Once the first disk reaches 100%, the tasks are starting to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)