Did you try running 30-40 proc(s) on one machine and another 30-40 proc(s) on another machine to see if that doubles the throughput ?
On Fri, Mar 1, 2013 at 10:46 AM, Varun Sharma <[email protected]> wrote: > Hi, > > I don't know how many worker threads you have at the thrift servers. Each > thread gets dedicated to a single connection and only serves that > connection. New connections get queued. Also, are you sure that you are not > saturating the client side making the calls ? > > Varun > > > On Fri, Mar 1, 2013 at 9:33 AM, Jean-Daniel Cryans <[email protected]>wrote: > >> The primary unit of load distribution in HBase is the region, make >> sure you have more than one. This is well documented in the manual >> http://hbase.apache.org/book/perf.writing.html >> >> J-D >> >> On Fri, Mar 1, 2013 at 4:17 AM, Dan Crosta <[email protected]> wrote: >> > We are using a 6-node HBase cluster with a Thrift Server on each of the >> RegionServer nodes, and trying to evaluate maximum write throughput for our >> use case (which involves many processes sending mutateRowsTs commands). >> Somewhere between about 30 and 40 processes writing into the system we >> cross the threshold where adding additional writers yields only very >> limited returns to throughput, and I'm not sure why. We see that the CPU >> and Disk on the DataNode/RegionServer/ThriftServer machines are not >> saturated, nor is the NIC in those machines. I'm a little unsure where to >> look next. >> > >> > A little more detail about our deployment: >> > >> > * CDH 4.1.2 >> > * DataNode/RegionServer/ThriftServer class: EC2 m1.xlarge >> > ** RegionServer: 8GB heap >> > ** ThriftServer: 1GB heap >> > ** DataNode: 4GB heap >> > ** EC2 ephemeral (i.e. local, not EBS) volumes used for HDFS >> > >> > If there's any other information that I can provide, or any other >> configuration or system settings I should look at, I'd appreciate the >> pointers. >> > >> > Thanks, >> > - Dan >> > >
