Tim, Here's the problem in a nutshell, With respect to hardware, you have 5.4k rpms ? 6 drive and 8 cores? Small slow drives, and still a ratio less than one when you compare drives to spindles.
I appreciate that you want to maximize performance, but when it comes to tuning, you have to start before you get your hardware. You are asking a question about tuning, but how can we answer if the numbers are ok? Have you looked at your GCs and implemented mslabs? We don't know. Network configuration? I mean that there's a lot missing and fine tuning a cluster is something you have to do on your own. I guess I could say your numbers look fine to me for that config... But honestly, it would be a swag. Sent from a remote device. Please excuse any typos... Mike Segel On Feb 1, 2012, at 7:09 AM, Tim Robertson <timrobertson...@gmail.com> wrote: > Thanks Michael, > > It's a small cluster, but is the hardware so bad? We are particularly > interested in relatively low load for random read write (2000 > transactions per second on <1k rows) but a decent full table scan > speed, as we aim to mount Hive tables on HBase backed tables. > > Regarding tuning... not exactly sure which you would be interested in > seeing. The config is all here: > http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates > > Cheers, > Tim > > > > On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel <michael_se...@hotmail.com> > wrote: >> No. >> What tuning did you do? >> Why such a small cluster? >> >> Sorry, but when you start off with a bad hardware configuration, you can get >> Hadoop/HBase to work, but performance will always be sub-optimal. >> >> >> >> Sent from my iPhone >> >> On Feb 1, 2012, at 6:52 AM, "Tim Robertson" <timrobertson...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> We have a 3 node cluster (CD3u2) with the following hardware: >>> >>> RegionServers (+DN + TT) >>> CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad) >>> Disks: 6x250G SATA 5.4K >>> Memory: 24GB >>> >>> Master (+ZK, JT, NN) >>> CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad) >>> Disks: 2x500G SATA 7.2K >>> Memory: 8GB >>> >>> Memory wise, we have: >>> Master: >>> NN: 1GB >>> JT: 1GB >>> HBase master: 6GB >>> ZK: 1GB >>> RegionServers: >>> RegionServer: 6GB >>> TaskTracker: 1GB >>> 11 Mappers @ 1GB each >>> 7 Reducers @ 1GB each >>> >>> HDFS was empty, and I ran randomWrite and scan both with number >>> clients of 50 (seemed to spawn 500 Mappers though...) >>> >>> randomWrite: >>> 12/02/01 13:27:47 INFO mapred.JobClient: ROWS=52428500 >>> 12/02/01 13:27:47 INFO mapred.JobClient: ELAPSED_TIME=84504886 >>> >>> scan: >>> 12/02/01 13:42:52 INFO mapred.JobClient: ROWS=52428500 >>> 12/02/01 13:42:52 INFO mapred.JobClient: ELAPSED_TIME=8158664 >>> >>> Would I be correct in thinking that this is way below what is to be >>> expected of this hardware? >>> We're setting up ganglia now to start debugging, but any suggestions >>> on how to diagnose this would be greatly appreciated. >>> >>> Thanks! >>> Tim >