Hi Gabriel, What max heap to you give the various daemons? This is really odd that you see OOMEs, I would like to know what it has consumed. You are saying the Hadoop DataNodes actually crash with the OOME?
Lars On Mon, Dec 6, 2010 at 9:02 AM, Gabriel Reid <gabriel.r...@gmail.com> wrote: > Hi, > > We're currently running into issues with running a MapReduce job over > a complete HBase table - we can't seem to find a balance between > having dfs.datanode.max.xcievers set too low (and getting > "xceiverCount X exceeds the limit of concurrent xcievers") and getting > OutOfMemoryErrors on datanodes. > > When trying to run a MapReduce job on the complete table we inevitably > get one of the two above errors eventually -- using a more restrictive > Scan with a startRow and stopRow for the job runs without problems. > > An important note is that the table that is being scanned has a large > disparity in the size of the values being stored -- one column family > contains values that are all generally around 256 kB in size, while > the other column families in the table contain values that are closer > to 256 bytes. The hbase.hregion.max.filesize setting is still at the > default (256 MB), meaning that we have HFiles for the big column that > are around 256 MB, and HFiles for the other columns that are around > 256 kB. The dfs.datanode.max.xcievers setting is currently at 2048, > and this is running a 5-node cluster. > > The table in question has about 7 million rows, and we're using > Cloudera CDH3 (HBase 0.89.20100924 and Hadoop 0.20.2). > > As far as I have been able to discover, the correct thing to do (or to > have done) is to set the hbase.hregion.max.filesize to a larger value > to have a smaller number of rows, which as I understand would probably > solve the issue here. > > My questions are: > 1. Is my analysis about having a larger hbase.hregion.max.filesize correct? > 2. Is there something else that we can do to resolve this? > 3. Am I correct in assuming that the best way to resolve this now is > to make the hbase.hregion.max.filesize setting larger, and then use > the org.apache.hadoop.hbase.util.Merge tool as discussed at > http://osdir.com/ml/general/2010-12/msg00534.html ? > > Any help on this would be greatly appreciated. > > Thanks, > > Gabriel >