On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner <e...@gigya.com> wrote:
> I started with a basic insert operation. Inserting rows with one
> column with 1KB of data each.
> Initially, when the table was empty I was getting around 300 inserts
> per second with 50 writing threads. Then, when the region split and a
> second server was added the rate suddenly jumped to 3000 inserts/sec
> per server, so ~6000 for the two servers. Over time as more servers
> were added the rate actually went down, and stabilized on around 2000
> inserts/sec per server.
>

What if you ran your client on more than one server?

An insert is a single 1k cell?

Tell us more about your configs.  Are you using defaults?  If you
watch the logs during your upload, do you see much blocking?

> I also conducted a random column read test, where I read different
> number of columns from randomly selected rows. First I tested reading
> only one specific column (the first in each row). It started at around
> 60r/s  per server and gradually (I assume as more data was loaded into
> the cache)  increased to ~800 r/s per server.

You can check the regionserver log.  It emits a cache stats log line
every so often.  Check cache hit rate percentage.

> When reading 5 random
> columns from each row the rate dropped to around 400 rows/sec and when
> fetching full rows (each with 100 columns) the rate remained about the
> same, at 400 rows/sec per server.
>

100 columns in a row is 100k, right?

> I'm not sure exactly what should I expect but I was hoping for much
> higher numbers. I read somewhere that for small data it is reasonable
> to expect 10K inserts per core per second. I know 1KB isn't small but
> these are 8 core machines and they are doing about 2K inserts. Also
> the read rate is very low considering all the data should fit in RAM.
> The interesting thing is that there doesn't seem to be any resource
> bottleneck. IO utilization on the servers is negligible and CPU is
> around 40-50% utilization. The client generating the load is not
> loaded either (around 5% CPU utilization). Client network was at 30%
> utilization when reading full rows. So the only reason for flat-lining
> is some sort of lock contention. Does this make sense?
>

This could be the case.  If you jstack during the reads, what are you
seeing?  Are servers locked up waiting to pass a synchronization point
or waiting on a lock?

St.Ack

Reply via email to