Regarding data storage in HBase

Praveen Sripati Thu, 19 Jan 2012 03:35:00 -0800

Hi,

According to the `Hadoop - The Definitive Guide`


Writes arriving at a regionserver are first appended to a commit log and
then are added to an in-memory memstore. When a memstore fills, its content
is flushed to the filesystem.
The commit log is hosted on HDFS, so it remains available through a
regionserver crash.

Couple of questions

1. When the memstore fills, is it flushed to HDFS or local file system?

2. If the region size (hbase.hregion.max.filesize) is set to 200MB and the
HDFS Block Size is set to 64MB, will the region be split across 4 data
nodes? I know that this doesn't make sense to split a single regions data
across nodes in HDFS, but how is it handled in HBase?

3. Is region size (hbase.hregion.max.filesize) the size of commit log or
the size of the file that has been flushed?

4. The commit log might become big over time, is there similar concept of
checkpoint in HBase for the commit logs?

I am familiar with HDFS and trying to map it to HBase.

Regards,
Praveen

Regarding data storage in HBase

Reply via email to