Hi, I'm a bit late to this thread but anyways... One other valuable profiling tool is Ganglia which you can install on your cluster using a boostrap action
elastic-mapreduce --create --alive --instance-type m1.xlarge --num-instances 5 \ --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia This tracks all the system level stuff you'd expect (cpu/ram/network etc) as well as a bunch of hadoop level metrics For more details see here... http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/index.html?Ganglia_reports.html Cheers, Mat (SDE on EMR) On 14 April 2011 03:18, Thomas Rewig <[email protected]> wrote: > Hello > right now I'm testing Mahout (taste) Jobs on AWS EMR. > I wonder if anyone does have any experience with the best cluster size and > the best EC2 instances. Are there any best practices for mahout (taste) > jobs?
