Talked to Stack. It's not completely crazy idea. May be implemented as tiny
lib, which can be used when row keys are randomized in some way by
application logic. In this case randomization would take into account how
individual regionservers behave (wrt writing speed).

Would be very interesting to try to implement smth like this on top of
asynchbase. Note, that asynchbase helps to cope with the problem when
regionservers have periodic drop-off in writing, but doesn't solve the
problem of slowness of individual RSs. This can't be addressed in generic
way, but in some more specific cases can (like when row keys are
"randomized", as explained above and in earlier message). So, as far as I
understand this should be addressed on higher level.

Alex Baranau
------
Sematext :: http://blog.sematext.com

On Thu, May 17, 2012 at 10:23 AM, Alex Baranau <alex.barano...@gmail.com>wrote:

> Hi,
>
> 1.
> Not sure if you've seen HBaseWD (https://github.com/sematext/HBaseWD)
> project. It implements the "salt keys with prefix" approach when writing
> monotonically increasing row key/timeseries data. If simplified, the idea
> is to add random prefix to the row key so that writes end up on different
> region servers (avoiding single RS hotspot).
>
> 2.
> When writing data to HBase with salted or random keys (so that load is
> well distributed over cluster) the write speed per RS is limited by the
> slowest RS in cluster (singe one Region is served by one RS).
>
> Given 1 & 2 I got this crazy idea to:
> * write in multiple threads
> * each prefix (or interval of keys in case of completely random keys) is
> assigned to particular thread, so that records with this prefix always
> written by that thread
> * measure how well each thread performs (e.g. write speed)
> * based on each thread performance, salt (or randomize) keys in a biased
> way, so that threads which perform better got more records to write
>
> Thus we will be loading less those RSs that are "slower" and overall load
> will be more or less balanced which will give max write performance for the
> cluster.
> This might work if each thread is writing into relatively small number of
> all RSs though only, I think. Otherwise they will perform more or less the
> same.
>
> Am I completely crazy when thinking about this? Does it makes sense to you
> at all?
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/
>

Reply via email to