Image :)
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
From: Vladimir Rodionov vrodio...@carrieriq.com
To: user@hbase.apache.org user@hbase.apache.org,
Date: 02/27/2014
I am thinking of storing medium sized objects (~1M) using HBase. The
advantage of using HBase rather than HBase (storing pointers) + HDFS, in
my mind, is:
data locality. When I want to run analytics, I will access these objects
using HBase scan, and HBase stores KVs in a sequential manner. If I
What type of analytics are you going to do on medium sized objects (1M)?
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com
From: Wei Tan [w...@us.ibm.com]
Sent: Wednesday, February
Yes for sure you can use hbase for this, you can have
1. different fields of mail in different column of a column family and
attachment as a binary array also in a column.
2. you can keep whole message in columns in hbase and the attachments are
large enoug on the hdfs and some reference to it in
The only other thing I'd add is, by default HBase caps size of the data per
column at 10 MB (I think). You can change that by changing this setting:
hbase.client.keyvalue.maxsize
in hbase-site.xml
-1 means no cap. You can put other numbers for appropriate cap for your use
case.
Ameya
On Tue,
Minor:
Value 0 also means no cap - see HTable#validatePut()
if (maxKeyValueSize 0) {
...
if (kv.getLength() maxKeyValueSize) {
throw new IllegalArgumentException(KeyValue size too large);
}
On Tue, Feb 25, 2014 at 11:52 AM, Ameya Kanitkar
Usually, it is not advisable to store such a large values in HBase (to avoid
excessive IO during compaction).
Keep them in a separate files in HDFS and store in HBase only references. To
overcome inherent max file number limitation of NN
you can bulk several values into a single file (you will
Me too realize same what you suggest...: (Keep them in a separate files in
HDFS and store in HBase only references)
will try several attachments into a single file...
And Thanks a lot...
On Wed, Feb 26, 2014 at 1:45 AM, Vladimir Rodionov
vrodio...@carrieriq.comwrote:
Usually, it is not
I have to use hbase and have mix type of data
Some of them have size 1-4K(Mail- Header) and others
5MB(Attachments...)
And also we need only random access: any data
Is HBase is feasible for storing this type of data
What will be my schema design -
will have to go with 2 different Table -