On Fri, Mar 26, 2010 at 4:35 PM, Mike Malone <m...@simplegeo.com> wrote:
> With the random partitioner there's no need to suggest a token. The key
> space is statistically random so you should be able to just split 2^128 into
> equal sized segments and get fairly equal storage load. Your read / write
> load could get out of whack if you have hot spots and stuff, I guess. But
> for a large distributed data set I think that's unlikely.
> For order preserving partitioners it's harder. We've been thinking about
> this issue at SimpleGeo and were planning on implementing an algorithm that
> could determine the median row key statistically without having to inspect
> every key. Basically, it would pull a random sample of row keys (maybe from
> the Index file?) and then determine the median of that sample. Thoughts?

That's exactly what the bootstrap token calculation does for OPP,
after picking the most-loaded node to talk to.  You could expose that
over JMX, or generalize it to giving say 100 tokens, evenly spaced, so
the tool could estimate position to within 1%.

-Jonathan

Reply via email to