Could you set a reserved room for non-dfs usage? Just to avoid the disk gets full. <hdfs-site.xml>
<property> <name>dfs.datanode.du.reserved</name> <value></value> <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use. </description> </property> 2014-10-09 14:01 GMT+08:00 SF Hadoop <sfhad...@gmail.com>: > I'm not sure if this is an HBase issue or an Hadoop issue so if this is > "off-topic" please forgive. > > I am having a problem with Hadoop maxing out drive space on a select few > nodes when I am running an HBase job. The scenario is this: > > - The job is a data import using Map/Reduce / HBase > - The data is being imported to one table > - The table only has a couple of regions > - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the > datanode / regionserver that is hosting the regions > - As the job progresses (and more data is imported) the two datanodes > hosting the regions start to get full and eventually drive space hits 100% > utilization whilst the other nodes in the cluster are at 40% or less drive > space utilization > - The job in Hadoop then begins to hang with multiple "out of space" > errors and eventually fails. > > I have tried running hadoop balancer during the job run and this helped > but only really succeeded in prolonging the eventual job failure. > > How can I get Hadoop / HBase to distribute the data to HDFS more evenly > when it is favoring the nodes that the regions are on? > > Am I missing something here? > > Thanks for any help. > -- Bing Jiang