HBase 6x bigger than raw data

2014-01-27 Thread Nick Xie
I'm importing a set of data into HBase. The CSV file contains 82 entries per line. Starting with 8 byte ID, followed by 16 byte date and the rest are 80 numbers with 4 bytes each. The current HBase schema is: ID as row key, date as a 'date' family with 'value' qualifier, the rest is in another

Re: HBase 6x bigger than raw data

2014-01-27 Thread Ted Yu
Which HBase release are you using ? On Mon, Jan 27, 2014 at 2:12 PM, Nick Xie nick.xie.had...@gmail.com wrote: I'm importing a set of data into HBase. The CSV file contains 82 entries per line. Starting with 8 byte ID, followed by 16 byte date and the rest are 80 numbers with 4 bytes each.

Re: HBase 6x bigger than raw data

2014-01-27 Thread Tom Brown
I believe each cell stores its own copy of the entire row key, column qualifier, and timestamp. Could that account for the increase in size? --Tom On Mon, Jan 27, 2014 at 3:12 PM, Nick Xie nick.xie.had...@gmail.com wrote: I'm importing a set of data into HBase. The CSV file contains 82

Re: HBase 6x bigger than raw data

2014-01-27 Thread Nick Xie
Hi Ted, it is 0.92.1. Does the version matter? Thanks, Nick On Mon, Jan 27, 2014 at 2:32 PM, Ted Yu yuzhih...@gmail.com wrote: Which HBase release are you using ? On Mon, Jan 27, 2014 at 2:12 PM, Nick Xie nick.xie.had...@gmail.com wrote: I'm importing a set of data into HBase. The

Re: HBase 6x bigger than raw data

2014-01-27 Thread Nick Xie
Tom, Yes, you are right. According to this analysis ( http://prafull-blog.blogspot.in/2012/06/how-to-calculate-record-size-of-hbase.html) if it is right, then the overhead is quite big if the cell value occupies a small portion. In the analysis in that link, the overhead is actually 10x(the

RE: HBase 6x bigger than raw data

2014-01-27 Thread Vladimir Rodionov
, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Nick Xie [nick.xie.had...@gmail.com] Sent: Monday, January 27, 2014 2:40 PM To: user@hbase.apache.org Subject: Re: HBase 6x bigger than raw data Tom, Yes, you are right. According to this analysis ( http://prafull

Re: HBase 6x bigger than raw data

2014-01-27 Thread Ted Yu
To make better use of block cache, see: HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding / prefix compression) which is in 0.94 and above To reduce size of HFiles, please see: http://hbase.apache.org/book.html#compression On Mon, Jan 27, 2014 at 2:40 PM, Nick Xie

Re: HBase 6x bigger than raw data

2014-01-27 Thread Tom Brown
Does enabling compression include prefix compression (HBASE-4218), or is there a separate switch for that? --Tom On Mon, Jan 27, 2014 at 3:48 PM, Ted Yu yuzhih...@gmail.com wrote: To make better use of block cache, see: HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding /

Re: HBase 6x bigger than raw data

2014-01-27 Thread Nick Xie
...@gmail.com] Sent: Monday, January 27, 2014 2:40 PM To: user@hbase.apache.org Subject: Re: HBase 6x bigger than raw data Tom, Yes, you are right. According to this analysis ( http://prafull-blog.blogspot.in/2012/06/how-to-calculate-record-size-of-hbase.html ) if it is right, then the overhead

Re: HBase 6x bigger than raw data

2014-01-27 Thread Ted Yu
Enabling compression (http://hbase.apache.org/book.html#compression) is separate from data block encoding (HBASE-4218). Cheers On Mon, Jan 27, 2014 at 2:59 PM, Tom Brown tombrow...@gmail.com wrote: Does enabling compression include prefix compression (HBASE-4218), or is there a separate

Re: HBase 6x bigger than raw data

2014-01-27 Thread Koert Kuipers
@hbase.apache.org Subject: Re: HBase 6x bigger than raw data Tom, Yes, you are right. According to this analysis ( http://prafull-blog.blogspot.in/2012/06/how-to-calculate-record-size-of-hbase.html ) if it is right, then the overhead is quite big if the cell value occupies a small

Re: HBase 6x bigger than raw data

2014-01-27 Thread Ted Yu
From: Nick Xie [nick.xie.had...@gmail.com] Sent: Monday, January 27, 2014 2:40 PM To: user@hbase.apache.org Subject: Re: HBase 6x bigger than raw data Tom, Yes, you are right. According to this analysis ( http://prafull-blog.blogspot.in/2012/06/how