Hi Asaf,

I believe HDFS will see the 35.5MB worth of data.  The 132.7MB is the size
of the data in the memstore with the overhead of the ConcurrentSkipListMap
which is a pointer-heavy data structure.

Are you using compression?  If so then the 35.5 is the compressed size, and
you should see a metric on the regionserver's rs-status page with
the storefileUncompressedSizeMB.

Let's say you are using compression and
your storefileUncompressedSizeMB=70MB, then:

* start with 132.7MB in the memstore
* the memstore flusher will rewrite the data into blocks (totaling 70MB),
compress them, and write 35.5MB of data to HDFS
* when reading the blocks back, they will be uncompressed and stored in the
block cache with total size of 70MB

Matt


On Thu, Jun 28, 2012 at 10:17 AM, Asaf Mesika <asaf.mes...@gmail.com> wrote:

> Hi,
>
> I'm trying to figure out some discrepancies I'm witnessing in the HBase
> Region Server log file.
>
> It states that a flush was requested, and then a memstore flush is started.
> It says the flush size, after snapshotting is 139105600 (~132.7m).
> In the log message below, the file size of the file the memstore was
> flushed too is not the same size (132.7m):
>
> [Quote]
> Added
>
> hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/data/8aa2974bdd7c4222a783b9b1558f9915,
> entries=214652, sequenceid=1068343, filesize=*35.5m*
> [/Quote]
>
> I'm curios to this difference in size, since I eventually would like to
> know the HDFS write throughput HBase is experiencing while flushing
> memstore to disk.
>
>
> *Logs quote*
>
> 2012-06-25 16:23:11,905 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Flush requested on
> food_logs,,1340630544300.64fd0d8da5714f03eb67f7f788a99960.
> 2012-06-25 16:23:11,906 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for
> food_logs,,1340630544300.64fd0d8da5714f03eb67f7f788a99960., current region
> memstore size 132.7m
> 2012-06-25 16:23:11,906 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Finished snapshotting
> food_logs,,1340630544300.64fd0d8da5714f03eb67f7f788a99960., commencing wait
> for mvcc, flushsize=139105600
> 2012-06-25 16:23:11,906 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Finished snapshotting, commencing flushing stores
> 2012-06-25 16:23:11,938 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> Creating
>
> file:hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915with
> permission:rwxrwxrwx
> 2012-06-25 16:23:11,960 DEBUG
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with
> CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false]
> [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false]
> [cacheEvictOnClose=false] [cacheCompressed=false]
> 2012-06-25 16:23:11,960 INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
> type for
>
> hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915:
>  CompoundBloomFilterWriter
> 2012-06-25 16:23:12,218 DEBUG
> org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested
> 2012-06-25 16:23:13,043 INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO
> DeleteFamily was added to HFile (
>
> hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915
> )
> 2012-06-25 16:23:13,043 INFO org.apache.hadoop.hbase.regionserver.Store:
> Flushed , sequenceid=1068343, memsize=132.7m, into tmp file
>
> hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915
> 2012-06-25 16:23:13,050 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Renaming flushed file at
>
> hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/.tmp/8aa2974bdd7c4222a783b9b1558f9915
>  to
>
> hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/data/8aa2974bdd7c4222a783b9b1558f9915
> 2012-06-25 16:23:13,068 INFO org.apache.hadoop.hbase.regionserver.Store:
> Added
>
> hdfs://dror.foo.local:8020/hbase/food_logs/64fd0d8da5714f03eb67f7f788a99960/data/8aa2974bdd7c4222a783b9b1558f9915,
> entries=214652, sequenceid=1068343, filesize=35.5m
> 2012-06-25 16:23:13,071 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memstore flush of ~132.7m/139105600, currentsize=9.2m/9603424 for
> region food_logs,,1340630544300.64fd0d8da5714f03eb67f7f788a99960. in
> 1165ms, sequenceid=1068343, compaction requested=true
>

Reply via email to