Re: Hbase row ingestion ..

2015-04-30 Thread Michael Segel
I wouldn’t call storing attributes in separate columns a ‘rigid schema’. You are correct that you could write your data as a CLOB/BLOB and store it in a single cell. The upside is that its more efficient. The downside is that its really an all or nothing fetch and then you need to write the

Re: Hbase row ingestion ..

2015-04-30 Thread James Estes
Guatam, Michael makes a lot of good points. Especially the importance of analyzing your use case for determining the row key design. We (Jive) did a talk at HBasecon a couple years back talking about our row key redesign to vastly improve performance. It also talks a little about the write

Re: Hbase row ingestion ..

2015-04-30 Thread Michael Segel
Heh.. I just did a talk at BDTC in Boston… of course at the end of the last day… small audience. Bucketing is a bit different from just hashing the rowkey. If you are doing get(), then having 480 buckets isn’t a problem. Doing a range scan over the 480 buckets makes getting your sort ordered

Re: Hbase row ingestion ..

2015-04-30 Thread Michael Segel
Exactly! So if you don’t need to know if your table is bucketed or not. You just put() or get()/scan() like it any other table. On Apr 30, 2015, at 3:00 PM, Andrew Mains andrew.ma...@kontagent.com wrote: Thanks all again for the replies--this is a very interesting discussion :).

Re: Hbase row ingestion ..

2015-04-30 Thread Andrew Mains
Thanks all again for the replies--this is a very interesting discussion :). @Michael HBASE-12853 is definitely an interesting proposition for our (Upsight's) use case--we've done a moderate amount of work to make our reads over the bucketed table efficient using hive. In particular, we added

Re: Hbase row ingestion ..

2015-04-30 Thread Gautam
Thanks Guys for responding! Michael, I indeed should have elaborated on our current rowkey design. Re: hotspotting, We'r doing exactly what you'r suggesting, i.e. fanning out into buckets where the bucket location is a hash(message_unique_fields) (we use murmur3). So our write pattern is

Re: Hbase row ingestion ..

2015-04-29 Thread Gautam
.. I'd like to add that we have a very fat rowkey. - Thanks. On Wed, Apr 29, 2015 at 5:30 PM, Gautam gautamkows...@gmail.com wrote: Hello, We'v been fighting some ingestion perf issues on hbase and I have been looking at the write path in particular. Trying to optimize on write path

Hbase row ingestion ..

2015-04-29 Thread Gautam
Hello, We'v been fighting some ingestion perf issues on hbase and I have been looking at the write path in particular. Trying to optimize on write path currently. We have around 40 column qualifiers (under single CF) for each row. So I understand that each put(row) written into hbase would

Re: Hbase row ingestion ..

2015-04-29 Thread Esteban Gutierrez
Hi Gautam, Your reasoning is correct and that will improve the write performance, specially if you always need to write all the qualifiers in a row (sort of a rigid schema). However you should consider to use qualifiers at some extent if the read pattern might include some conditional search,