I was only du'ing the table dir. The tmp dirs only had a couple of hundred
bytes in my case.
The HFile tool only gives the avgKeyLen=46. This does not include 4 bytes
KeyLength + 4 bytes ValueLength.
Now indeed I get a total of 54 bytes/KV *1.5 billion ~= 81GB. Probably
there are also leftovers from HDFS blocks not being fully occupied.

Thanks,
Sever


On Tue, Jul 3, 2012 at 2:29 PM, Stack <[email protected]> wrote:

> On Tue, Jul 3, 2012 at 2:17 PM, Sever Fundatureanu
> <[email protected]> wrote:
> > Right, forgot about the timestamps. These should be a long value each,
> so 8
> > bytes. The versioning is set to 1 so it shouldn't count.
> > Note the column qualifier is also void on each entry.
> >
> > So now we get (33+1+8)x1.5*10^9 = 63GB, still a 19GB difference...
> >
>
> What about regionserver WAL logs?  You including these in your math or
> are you just du'ing the table dir?  The table dir can have tmp dirs
> for compaction and split work.  And after Michael Segel, the KV has a
> type byte as well as some lengths for finding offsets in KV; take a
> looksee w/ the hfile tool:
> http://hbase.apache.org/book.html#hfile_tool2
>
> St.Ack
>



-- 
Sever Fundatureanu

Vrije Universiteit Amsterdam
E-mail: [email protected]

Reply via email to