Are you putting this data from a single host?  Is your sender
multi-threaded?

I note that (20 GB / 20 minutes < 20 MB / s) so you aren't particularly
stressing the network.  You would likely be stressing a single threaded
client pretty severely.

What is your record size?  It may be that you are bound up by the number of
records being inserted rather than the total data size.

On Thu, Mar 24, 2011 at 5:22 PM, Vivek Krishna <vivekris...@gmail.com>wrote:

> Data Size - 20 GB.  It took about an hour with default hbase setting and
> after varying several parameters, we were able to get this done in ~20
> minutes.  This is slow and we are trying to improve.
>
> We wrote a java client which would essentially `put` to hbase tables in
> batches.  Our fine-tuning parameters include,
> 1.  Disabling compaction
> 2.  Varying batch sizes of put ( tried with 1000, 5000, 10000, 20000, 40000
> )
> 3.  Setting AutoFlush to on/off.
> 4.  Varying write buffer(in client)  with 2mb, 128mb,256mb
> 5.  Changing regionserver.handler.count to 100
> 6.  Varying regionserver size from 128 to 256/512/1024.
> 7.  Increasing number of regions.
> 8.  Creating regions with keys pre-specified (so that clients hit the
> regions directly)
> 9.  Varying number of clients (from 30 clients to 100 clients)
>
> The above was tested on a 38 node cluster with 2 regions each.
>
> We did not try disabling WAL fearing loss of data.
>
> Are there any other parameters that we missed during the process?
>
>
> Viv
>

Reply via email to