Hi, According to the `Hadoop - The Definitive Guide`
Writes arriving at a regionserver are first appended to a commit log and then are added to an in-memory memstore. When a memstore fills, its content is flushed to the filesystem. The commit log is hosted on HDFS, so it remains available through a regionserver crash. Couple of questions 1. When the memstore fills, is it flushed to HDFS or local file system? 2. If the region size (hbase.hregion.max.filesize) is set to 200MB and the HDFS Block Size is set to 64MB, will the region be split across 4 data nodes? I know that this doesn't make sense to split a single regions data across nodes in HDFS, but how is it handled in HBase? 3. Is region size (hbase.hregion.max.filesize) the size of commit log or the size of the file that has been flushed? 4. The commit log might become big over time, is there similar concept of checkpoint in HBase for the commit logs? I am familiar with HDFS and trying to map it to HBase. Regards, Praveen