On Wed, Dec 19, 2012 at 1:26 PM, David Arthur <[email protected]> wrote:
> Let's say you want to decompose a url into domain and path to include in > your row key. > > You could of course just use the url as the key, but you will see > hotspotting since most will start with "http". Doesn't the original Bigtable paper [0] design around this problem by dropping the protocol and only storing the domain? *goes to check* Yes, it does. Personally, I've never encountered an HBase schema design problem where salting really nailed it. It's an okay place to start with initial designs, especially if you don't know your data well. I'm a big fan of using the natural variance in the data itself to solve this problem. OpenTSDB does this quite well, IMHO. Plus, it's kind of a game or data puzzle -- how to use the data's nature to your advantage :) Just my 2ยข -n [0]: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf
