[ 
https://issues.apache.org/jira/browse/HDFS-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HDFS-5022:
------------------------

    Description: 
Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of 
configured disk space, it will become out of service silently, there's no way 
for user to analyze what happened to the datanode. Actually, user even won't 
notice the datanode is out-of-service, not any warning message in either 
namenode or datanode log.

One example is if there's only one single datanode, and we are running a MR job 
writing huge data into HDFS, then when the disk is full, we can only observe 
error message like: 
{noformat}
java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
{noformat}
and don't know what happened and how to resolve the issue.

We need to improve this by adding more explicit error message in both datanode 
log and the message given to MR application.

  was:
Currently, if a datanode run out of configured disk space, it will become out 
of service silently, there's no way for user to analyze what happened to the 
datanode. Actually, user even won't notice the datanode is out-of-service, not 
any warning message in either namenode or datanode log.

One example is if there's only one single datanode, and we are running a MR job 
writing huge data into HDFS, then when the disk is full, we can only observe 
error message like: 
{noformat}
java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
{noformat}
and don't know what happened and how to resolve the issue.

We need to improve this by adding more explicit error message in both datanode 
log and the message given to MR application.

    
> Add explicit error message in log when datanode went out of service because 
> of low disk space
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5022
>                 URL: https://issues.apache.org/jira/browse/HDFS-5022
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Minor
>
> Currently, if "dfs.datanode.du.reserved" is set and a datanode run out of 
> configured disk space, it will become out of service silently, there's no way 
> for user to analyze what happened to the datanode. Actually, user even won't 
> notice the datanode is out-of-service, not any warning message in either 
> namenode or datanode log.
> One example is if there's only one single datanode, and we are running a MR 
> job writing huge data into HDFS, then when the disk is full, we can only 
> observe error message like: 
> {noformat}
> java.io.IOException: File xxx could only be replicated to 0 nodes instead of 1
> {noformat}
> and don't know what happened and how to resolve the issue.
> We need to improve this by adding more explicit error message in both 
> datanode log and the message given to MR application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to