Right, forgot about the timestamps. These should be a long value each, so 8 bytes. The versioning is set to 1 so it shouldn't count. Note the column qualifier is also void on each entry.
So now we get (33+1+8)x1.5*10^9 = 63GB, still a 19GB difference... Thanks, Sever On Tue, Jul 3, 2012 at 1:48 PM, Michael Segel <[email protected]>wrote: > Timestamps on the cells themselves? > # Versions? > > On Jul 3, 2012, at 4:54 AM, Sever Fundatureanu wrote: > > > Hello, > > > > I have a simpel table with 1.5 billion rows and one column familiy 'F'. > > Each row key is 33 bytes and the cell values are void. By doing the math > I > > would expect this table to take up (33+1)x1.5*10^9 = 51GB. However if I > do > > a "hadoop dfs -du" I get that the table takes up ~82GB. This is after > > running major compactions a couple of times. Can someone explain where > this > > difference might come from? > > > > Regards, > > -- > > Sever Fundatureanu > > > > Vrije Universiteit Amsterdam > > E-mail: [email protected] > > -- Sever Fundatureanu Vrije Universiteit Amsterdam E-mail: [email protected]
