Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-06 Thread Vitthal "Suhas" Gogate
I assume you have seen the following information on Hadoop twiki, http://wiki.apache.org/hadoop/GangliaMetrics So do you use GangliaContext31 in hadoop-metrics2.properties? We use Ganglia 3.2 with Hadoop 20.205 and works fine (I remember seeing gmetad sometime goes down due to buffer overflow pr

Re: error for deploying hadoop on macbook pro

2011-10-03 Thread Vitthal "Suhas" Gogate
Sorry few more things, -- localhost did not work for me.. I had to use my machine name returned by "hostname" e.g. horton-mac.local -- Also change the localhost to your machine name in conf/slaves and conf/masters file. --Suhas On Mon, Oct 3, 2011 at 5:44 PM, Vitthal "Suha

Re: error for deploying hadoop on macbook pro

2011-10-03 Thread Vitthal "Suhas" Gogate
Steps worked in the following document worked for me, except -- JAVA_HOME need to be set correctly in the conf/hadoop-env.sh -- By default on Mac OS X, sshd is not running, so need to start it using "System Preferences/Sharing" and add users who are allowed to do ssh. http://www.stanford.edu/cla

Re: Monitoring Slow job.

2011-10-03 Thread Vitthal "Suhas" Gogate
I am not sure there is a easy way to get what you want on command line.. one option is to use following command which would give you verbose job history where you can find submit, Launch & Finish time (including duration on FinishTime line). I am using hadoop-0.20.205.0 branch. So check if you ha

Re: incremental loads into hadoop

2011-10-02 Thread Vitthal "Suhas" Gogate
Agree with Bejoy, although to minimize the processing latency you can still choose to write more frequently to HDFS resulting into more number of smaller size files on HDFS rather than waiting to accumulate large size data before writing to HDFS. As you may have more number of smaller files, it ma

Re: configuring different number of slaves for MR jobs

2011-09-27 Thread Vitthal "Suhas" Gogate
Slaves file is used only by control scripts like {start/stop}-dfs.sh, {start/stop}-mapred.sh to start the data nodes and task trackers on specified set of slave machines.. they can not be used effectively to change the size of the cluster for each M/R job (unless you want to restart the task track