Hi,

I would like to explain our use case of HBase, the row key design and the
problems we are having so anyone can give us a help:

The first thing we noticed is that our data set is too small compared to
other cases we read in the list and forums. We have a table containing 20
million keys splitted automatically by HBase in 4 regions and balanced in 3
region servers. We have designed our key to keep together the set of keys
requested by our app. That is, when we request a set of keys we expect them
to be grouped together to improve data locality and block cache efficiency.

The second thing we noticed, compared to other cases, is that we retrieve a
bunch keys per request (500 aprox). Thus, during our peaks (3k requests per
minute), we have a lot of requests going to a particular region servers and
asking a lot of keys. That results in poor response times (in the order of
seconds). Currently we are using multi gets.

We think an improvement would be to spread the keys (introducing a
randomized component on it) in more region servers, so each rs will have to
handle less keys and probably less requests. Doing that way the multi gets
will be spread over the region servers.

Our questions:

1. Is it correct this design of asking so many keys on each request? (if
you need high performance)
2. What about splitting in more region servers? It's a good idea? How we
could accomplish this? We thought in apply some hashing...

Thanks in advance!

Reply via email to