What performance profile do you expect?
Where does it top out (i.e. how many ops/sec)?

Also note that each data item is replicated to three nodes (by HDFS). So in a 6 
machine cluster each machine would get 50% of the writes.
If you are looking for performance you really need a larger cluster to amortize 
this replication cost across more machines.

The other issue to watch out for is whether your keys are generated such that a 
single regionserver is hot spotted (you can look at the operation count on the 
master page).

-- Lars



________________________________
 From: Dan Crosta <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Friday, March 1, 2013 4:17 AM
Subject: HBase Thrift inserts bottlenecked somewhere -- but where?
 
We are using a 6-node HBase cluster with a Thrift Server on each of the 
RegionServer nodes, and trying to evaluate maximum write throughput for our use 
case (which involves many processes sending mutateRowsTs commands). Somewhere 
between about 30 and 40 processes writing into the system we cross the 
threshold where adding additional writers yields only very limited returns to 
throughput, and I'm not sure why. We see that the CPU and Disk on the 
DataNode/RegionServer/ThriftServer machines are not saturated, nor is the NIC 
in those machines. I'm a little unsure where to look next.

A little more detail about our deployment:

* CDH 4.1.2
* DataNode/RegionServer/ThriftServer class: EC2 m1.xlarge
** RegionServer: 8GB heap
** ThriftServer: 1GB heap
** DataNode: 4GB heap
** EC2 ephemeral (i.e. local, not EBS) volumes used for HDFS

If there's any other information that I can provide, or any other configuration 
or system settings I should look at, I'd appreciate the pointers.

Thanks,
- Dan

Reply via email to