Right, forgot about the timestamps. These should be a long value each, so 8
bytes. The versioning is set to 1 so it shouldn't count.
Note the column qualifier is also void on each entry.

So now we get (33+1+8)x1.5*10^9 = 63GB, still a 19GB difference...

Thanks,
Sever

On Tue, Jul 3, 2012 at 1:48 PM, Michael Segel <[email protected]>wrote:

> Timestamps on the cells themselves?
> # Versions?
>
> On Jul 3, 2012, at 4:54 AM, Sever Fundatureanu wrote:
>
> > Hello,
> >
> > I have a simpel table with 1.5 billion rows and one column familiy 'F'.
> > Each row key is 33 bytes and the cell values are void. By doing the math
> I
> > would expect this table to take up (33+1)x1.5*10^9 = 51GB. However if I
> do
> > a "hadoop dfs -du" I get that the table takes up ~82GB. This is after
> > running major compactions a couple of times. Can someone explain where
> this
> > difference might come from?
> >
> > Regards,
> > --
> > Sever Fundatureanu
> >
> > Vrije Universiteit Amsterdam
> > E-mail: [email protected]
>
>


-- 
Sever Fundatureanu

Vrije Universiteit Amsterdam
E-mail: [email protected]

Reply via email to