Hi Asaf, I believe HDFS will see the 35.5MB worth of data. The 132.7MB is the size of the data in the memstore with the overhead of the ConcurrentSkipListMap which is a pointer-heavy data structure.
Are you using compression? If so then the 35.5 is the compressed size, and you should see a metric on the regionserver's rs-status page with the storefileUncompressedSizeMB. Let's say you are using compression and your storefileUncompressedSizeMB=70MB, then: * start with 132.7MB in the memstore * the memstore flusher will rewrite the data into blocks (totaling 70MB), compress them, and write 35.5MB of data to HDFS * when reading the blocks back, they will be uncompressed and stored in the block cache with total size of 70MB Matt On Thu, Jun 28, 2012 at 10:17 AM, Asaf Mesika <asaf.mes...@gmail.com> wrote: > Hi, > > I'm trying to figure out some discrepancies I'm witnessing in the HBase > Region Server log file. > > It states that a flush was requested, and then a memstore flush is started. > It says the flush size, after snapshotting is 139105600 (~132.7m). > In the log message below, the file size of the file the memstore was > flushed too is not the same size (132.7m): > > [Quote] > Added > > hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/data/8aa2974bdd7c4222a783b9b1558f9915, > entries=214652, sequenceid=1068343, filesize=*35.5m* > [/Quote] > > I'm curios to this difference in size, since I eventually would like to > know the HDFS write throughput HBase is experiencing while flushing > memstore to disk. > > > *Logs quote* > > 2012-06-25 16:23:11,905 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Flush requested on > food_logs,,1340630544300.64fd0d8da5714f03eb67f7f788a99960. > 2012-06-25 16:23:11,906 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Started memstore flush for > food_logs,,1340630544300.64fd0d8da5714f03eb67f7f788a99960., current region > memstore size 132.7m > 2012-06-25 16:23:11,906 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Finished snapshotting > food_logs,,1340630544300.64fd0d8da5714f03eb67f7f788a99960., commencing wait > for mvcc, flushsize=139105600 > 2012-06-25 16:23:11,906 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Finished snapshotting, commencing flushing stores > 2012-06-25 16:23:11,938 DEBUG org.apache.hadoop.hbase.util.FSUtils: > Creating > > file:hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915with > permission:rwxrwxrwx > 2012-06-25 16:23:11,960 DEBUG > org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with > CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] > [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] > [cacheEvictOnClose=false] [cacheCompressed=false] > 2012-06-25 16:23:11,960 INFO > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter > type for > > hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915: > CompoundBloomFilterWriter > 2012-06-25 16:23:12,218 DEBUG > org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested > 2012-06-25 16:23:13,043 INFO > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO > DeleteFamily was added to HFile ( > > hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915 > ) > 2012-06-25 16:23:13,043 INFO org.apache.hadoop.hbase.regionserver.Store: > Flushed , sequenceid=1068343, memsize=132.7m, into tmp file > > hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915 > 2012-06-25 16:23:13,050 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Renaming flushed file at > > hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915 > to > > hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/data/8aa2974bdd7c4222a783b9b1558f9915 > 2012-06-25 16:23:13,068 INFO org.apache.hadoop.hbase.regionserver.Store: > Added > > hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/data/8aa2974bdd7c4222a783b9b1558f9915, > entries=214652, sequenceid=1068343, filesize=35.5m > 2012-06-25 16:23:13,071 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Finished memstore flush of ~132.7m/139105600, currentsize=9.2m/9603424 for > region food_logs,,1340630544300.64fd0d8da5714f03eb67f7f788a99960. in > 1165ms, sequenceid=1068343, compaction requested=true >