Can we run job on some datanodes ?

2011-09-21 Thread praveenesh kumar
Is there any way that we can run a particular job in a hadoop on subset of datanodes ? My problem is I don't want to use all the nodes to run some job, I am trying to make Job completion Vs No. of nodes graph for a particular job. One way to do is I can remove datanodes, and then see how much time

Re: Can we run job on some datanodes ?

2011-09-21 Thread Harsh J
Praveenesh, TaskTrackers run your jobs' tasks for you, not DataNodes directly. So you can statically control loads on nodes by removing away TaskTrackers from your cluster. i.e, if you "service hadoop-0.20-tasktracker stop" or "hadoop-daemon.sh stop tasktracker" on the specific nodes, jobs won't

Re: Can we run job on some datanodes ?

2011-09-21 Thread praveenesh kumar
Oh wow.. I didn't know that.. Actually for me datanodes/tasktrackers are running on same machines. I mention datanodes because if I delete those machines from masters list, chances are the data will also loose. So I don't want to do that.. but now I guess by stoping tasktrackers individually... I c

Re: Can we run job on some datanodes ?

2011-09-21 Thread Harsh J
Praveenesh, Absolutely right. Just stop them individually :) On Wed, Sep 21, 2011 at 6:53 PM, praveenesh kumar wrote: > Oh wow.. I didn't know that.. > Actually for me datanodes/tasktrackers are running on same machines. > I mention datanodes because if I delete those machines from masters list,

Re: Can we run job on some datanodes ?

2011-09-21 Thread Robert Evans
Praveen, If you are doing performance measurements be aware that having more datanodes then tasktrackers will impact the performance as well (Don't really know for sure how). It will not be the same performance as running on a cluster with just fewer nodes over all. Also if you do shut off da