I'm importing a set of data into HBase. The CSV file contains 82 entries
per line. Starting with 8 byte ID, followed by 16 byte date and the rest
are 80 numbers with 4 bytes each.
The current HBase schema is: ID as row key, date as a 'date' family with
'value' qualifier, the rest is in another
Which HBase release are you using ?
On Mon, Jan 27, 2014 at 2:12 PM, Nick Xie nick.xie.had...@gmail.com wrote:
I'm importing a set of data into HBase. The CSV file contains 82 entries
per line. Starting with 8 byte ID, followed by 16 byte date and the rest
are 80 numbers with 4 bytes each.
I believe each cell stores its own copy of the entire row key, column
qualifier, and timestamp. Could that account for the increase in size?
--Tom
On Mon, Jan 27, 2014 at 3:12 PM, Nick Xie nick.xie.had...@gmail.com wrote:
I'm importing a set of data into HBase. The CSV file contains 82
Hi Ted,
it is 0.92.1. Does the version matter?
Thanks,
Nick
On Mon, Jan 27, 2014 at 2:32 PM, Ted Yu yuzhih...@gmail.com wrote:
Which HBase release are you using ?
On Mon, Jan 27, 2014 at 2:12 PM, Nick Xie nick.xie.had...@gmail.com
wrote:
I'm importing a set of data into HBase. The
Tom,
Yes, you are right. According to this analysis (
http://prafull-blog.blogspot.in/2012/06/how-to-calculate-record-size-of-hbase.html)
if it is right, then the overhead is quite big if the cell value
occupies
a small portion.
In the analysis in that link, the overhead is actually 10x(the
, www.carrieriq.com
e-mail: vrodio...@carrieriq.com
From: Nick Xie [nick.xie.had...@gmail.com]
Sent: Monday, January 27, 2014 2:40 PM
To: user@hbase.apache.org
Subject: Re: HBase 6x bigger than raw data
Tom,
Yes, you are right. According to this analysis (
http://prafull
To make better use of block cache, see:
HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding / prefix
compression)
which is in 0.94 and above
To reduce size of HFiles, please see:
http://hbase.apache.org/book.html#compression
On Mon, Jan 27, 2014 at 2:40 PM, Nick Xie
Does enabling compression include prefix compression (HBASE-4218), or is
there a separate switch for that?
--Tom
On Mon, Jan 27, 2014 at 3:48 PM, Ted Yu yuzhih...@gmail.com wrote:
To make better use of block cache, see:
HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding /
...@gmail.com]
Sent: Monday, January 27, 2014 2:40 PM
To: user@hbase.apache.org
Subject: Re: HBase 6x bigger than raw data
Tom,
Yes, you are right. According to this analysis (
http://prafull-blog.blogspot.in/2012/06/how-to-calculate-record-size-of-hbase.html
)
if it is right, then the overhead
Enabling compression (http://hbase.apache.org/book.html#compression) is
separate from data block encoding (HBASE-4218).
Cheers
On Mon, Jan 27, 2014 at 2:59 PM, Tom Brown tombrow...@gmail.com wrote:
Does enabling compression include prefix compression (HBASE-4218), or is
there a separate
@hbase.apache.org
Subject: Re: HBase 6x bigger than raw data
Tom,
Yes, you are right. According to this analysis (
http://prafull-blog.blogspot.in/2012/06/how-to-calculate-record-size-of-hbase.html
)
if it is right, then the overhead is quite big if the cell value
occupies
a small
From: Nick Xie [nick.xie.had...@gmail.com]
Sent: Monday, January 27, 2014 2:40 PM
To: user@hbase.apache.org
Subject: Re: HBase 6x bigger than raw data
Tom,
Yes, you are right. According to this analysis (
http://prafull-blog.blogspot.in/2012/06/how
12 matches
Mail list logo