I wouldn’t call storing attributes in separate columns a ‘rigid schema’.
You are correct that you could write your data as a CLOB/BLOB and store it in a
single cell.
The upside is that its more efficient.
The downside is that its really an all or nothing fetch and then you need to
write the
Guatam,
Michael makes a lot of good points. Especially the importance of analyzing your
use case for determining the row key design. We (Jive) did a talk at HBasecon a
couple years back talking about our row key redesign to vastly improve
performance. It also talks a little about the write
Heh.. I just did a talk at BDTC in Boston… of course at the end of the last
day… small audience.
Bucketing is a bit different from just hashing the rowkey. If you are doing
get(), then having 480 buckets isn’t a problem.
Doing a range scan over the 480 buckets makes getting your sort ordered
Exactly!
So if you don’t need to know if your table is bucketed or not.
You just put() or get()/scan() like it any other table.
On Apr 30, 2015, at 3:00 PM, Andrew Mains andrew.ma...@kontagent.com wrote:
Thanks all again for the replies--this is a very interesting discussion :).
Thanks all again for the replies--this is a very interesting discussion :).
@Michael HBASE-12853 is definitely an interesting proposition for our
(Upsight's) use case--we've done a moderate amount of work to make our
reads over the bucketed table efficient using hive. In particular, we
added
Thanks Guys for responding!
Michael,
I indeed should have elaborated on our current rowkey design. Re:
hotspotting, We'r doing exactly what you'r suggesting, i.e. fanning out
into buckets where the bucket location is a hash(message_unique_fields)
(we use murmur3). So our write pattern is
.. I'd like to add that we have a very fat rowkey.
- Thanks.
On Wed, Apr 29, 2015 at 5:30 PM, Gautam gautamkows...@gmail.com wrote:
Hello,
We'v been fighting some ingestion perf issues on hbase and I have
been looking at the write path in particular. Trying to optimize on write
path
Hello,
We'v been fighting some ingestion perf issues on hbase and I have
been looking at the write path in particular. Trying to optimize on write
path currently.
We have around 40 column qualifiers (under single CF) for each row. So I
understand that each put(row) written into hbase would
Hi Gautam,
Your reasoning is correct and that will improve the write performance,
specially if you always need to write all the qualifiers in a row (sort of
a rigid schema). However you should consider to use qualifiers at some
extent if the read pattern might include some conditional search,