On Tue, Jul 3, 2012 at 2:17 PM, Sever Fundatureanu <[email protected]> wrote: > Right, forgot about the timestamps. These should be a long value each, so 8 > bytes. The versioning is set to 1 so it shouldn't count. > Note the column qualifier is also void on each entry. > > So now we get (33+1+8)x1.5*10^9 = 63GB, still a 19GB difference... >
What about regionserver WAL logs? You including these in your math or are you just du'ing the table dir? The table dir can have tmp dirs for compaction and split work. And after Michael Segel, the KV has a type byte as well as some lengths for finding offsets in KV; take a looksee w/ the hfile tool: http://hbase.apache.org/book.html#hfile_tool2 St.Ack
