Hi, I would like to explain our use case of HBase, the row key design and the problems we are having so anyone can give us a help:
The first thing we noticed is that our data set is too small compared to other cases we read in the list and forums. We have a table containing 20 million keys splitted automatically by HBase in 4 regions and balanced in 3 region servers. We have designed our key to keep together the set of keys requested by our app. That is, when we request a set of keys we expect them to be grouped together to improve data locality and block cache efficiency. The second thing we noticed, compared to other cases, is that we retrieve a bunch keys per request (500 aprox). Thus, during our peaks (3k requests per minute), we have a lot of requests going to a particular region servers and asking a lot of keys. That results in poor response times (in the order of seconds). Currently we are using multi gets. We think an improvement would be to spread the keys (introducing a randomized component on it) in more region servers, so each rs will have to handle less keys and probably less requests. Doing that way the multi gets will be spread over the region servers. Our questions: 1. Is it correct this design of asking so many keys on each request? (if you need high performance) 2. What about splitting in more region servers? It's a good idea? How we could accomplish this? We thought in apply some hashing... Thanks in advance!