Tim,

Here's the problem in a nutshell, 
With respect to hardware, you have  5.4k rpms ? 6 drive and 8 cores?
Small slow drives, and still  a ratio less than one when you compare drives to 
spindles.

I appreciate that you want to maximize performance, but when it comes to 
tuning, you have to start before you get your hardware. 

 You are asking a question about tuning, but how can we answer if the numbers 
are ok?
Have you looked at your GCs and implemented mslabs? We don't know. Network 
configuration?

I mean that there's a lot missing and fine tuning a cluster is something you 
have to do on your own. I guess I could say your numbers look fine to me for 
that config... But honestly, it would be a swag.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 1, 2012, at 7:09 AM, Tim Robertson <timrobertson...@gmail.com> wrote:

> Thanks Michael,
> 
> It's a small cluster, but is the hardware so bad?  We are particularly
> interested in relatively low load for random read write (2000
> transactions per second on <1k rows) but a decent full table scan
> speed, as we aim to mount Hive tables on HBase backed tables.
> 
> Regarding tuning... not exactly sure which you would be interested in
> seeing.  The config is all here:
> http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates
> 
> Cheers,
> Tim
> 
> 
> 
> On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel <michael_se...@hotmail.com> 
> wrote:
>> No.
>> What tuning did you do?
>> Why such a small cluster?
>> 
>> Sorry, but when you start off with a bad hardware configuration, you can get 
>> Hadoop/HBase to work, but performance will always be sub-optimal.
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>> On Feb 1, 2012, at 6:52 AM, "Tim Robertson" <timrobertson...@gmail.com> 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> We have a 3 node cluster (CD3u2) with the following hardware:
>>> 
>>> RegionServers (+DN + TT)
>>>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>>>  Disks: 6x250G SATA 5.4K
>>>  Memory: 24GB
>>> 
>>> Master (+ZK, JT, NN)
>>>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>>>  Disks: 2x500G SATA 7.2K
>>>  Memory: 8GB
>>> 
>>> Memory wise, we have:
>>> Master:
>>>  NN: 1GB
>>>  JT: 1GB
>>>  HBase master: 6GB
>>>  ZK: 1GB
>>> RegionServers:
>>>  RegionServer: 6GB
>>>  TaskTracker: 1GB
>>>  11 Mappers @ 1GB each
>>>  7 Reducers @ 1GB each
>>> 
>>> HDFS was empty, and I ran randomWrite and scan both with number
>>> clients of 50 (seemed to spawn 500 Mappers though...)
>>> 
>>> randomWrite:
>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
>>> 12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886
>>> 
>>> scan:
>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
>>> 12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664
>>> 
>>> Would I be correct in thinking that this is way below what is to be
>>> expected of this hardware?
>>> We're setting up ganglia now to start debugging, but any suggestions
>>> on how to diagnose this would be greatly appreciated.
>>> 
>>> Thanks!
>>> Tim
> 

Reply via email to