Hello, I am designing an architecture for a website to show analytics on a huge quantity of data. This data is stored in one HBase table and needs to be accessed in a semi-random manner. Typically, a big block of rowkeys that are contiguous will be read at once (say a few thousand rows) and some data displayed based on them. Where these blocks are within the table will be the random aspect.
I am trying to figure out how fast I can expect HBase to be. Is it something where I am ok to link the webpage directly to HBase for this reading and I can expect realtime page loads (<1 sec), or do I need to get a distributed cache like Redis running to cache the data so that if the user requests the same data over and over I don't waste time pulling it from HBase if it has already been loaded. In other words, generally speaking, are HBase and Redis/Memcached redundant or is there a strong use case for using HBase as the on-disk storage and Redis or Memcached for caching in memory to improve performance? Thanks, Scott