Hi Tsuyoshi, For which version of Hadoop is that? I think it's for 0.2x.x, right? Because I'm not able to find this class in 1.0.x
Thanks, JM 2012/12/8, Tsuyoshi OZAWA <ozawa.tsuyo...@gmail.com>: > Hi Hioryuki, > > Lately I've changed scheduler for improving hadoop, so I may help you. > > RMContainerAllocator#handleEvent decides MapTasks to allocated containers. > You can implement semi-strict(best effort allocation) mode by hacking > there. Note that, however, allocation of containers is done > by ResourceManager. MRAppMaster can not control where to allocate > containers, but where to allocate MapTasks. > > If you have any question, please ask me. > > Thanks, > Tsuyoshi > > > On Sat, Dec 8, 2012 at 4:51 AM, Jean-Marc Spaggiari > <jean-m...@spaggiari.org >> wrote: > >> Hi Hiroyuki, >> >> Have you made any progress on that? >> >> I'm also looking at a way to assign specific Map tasks to specific >> nodes (I want the Map to run where the data is). >> >> JM >> >> 2012/12/1, Michael Segel <michael_se...@hotmail.com>: >> > I haven't thought about reducers but in terms of mappers you need to >> > override the data locality so that it thinks that the node where you >> want to >> > send the data exists. >> > Again, not really recommended since it will kill performance unless the >> > compute time is at least an order of magnitude greater than the time it >> > takes to transfer the data. >> > >> > Really, really don't recommend it.... >> > >> > We did it as a hack, just to see if we could do it and get better >> > overall >> > performance for a specific job. >> > >> > >> > On Dec 1, 2012, at 6:27 AM, Harsh J <ha...@cloudera.com> wrote: >> > >> >> Yes, scheduling is done on a Tasktracker heartbeat basis, so it is >> >> certainly possible to do absolutely strict scheduling (although be >> >> aware of the condition of failing/unavailable tasktrackers). >> >> >> >> Mohit's suggestion is somewhat like what you desire (delay scheduling >> >> in fair scheduler config) - but setting it to very high values is bad >> >> to do (for jobs that don't need this). >> >> >> >> On Sat, Dec 1, 2012 at 4:11 PM, Hiroyuki Yamada <mogwa...@gmail.com> >> >> wrote: >> >>> Thank you all for the comments. >> >>> >> >>>> you ought to make sure your scheduler also does non-strict >> >>>> scheduling >> of >> >>>> data local tasks for jobs >> >>> that don't require such strictness >> >>> >> >>> I just want to make sure one thing. >> >>> If I write my own scheduler, is it possible to do "strict" scheduling >> >>> ? >> >>> >> >>> Thanks >> >>> >> >>> On Thu, Nov 29, 2012 at 1:56 PM, Mohit Anchlia >> >>> <mohitanch...@gmail.com >> > >> >>> wrote: >> >>>> Look at locality delay parameter >> >>>> >> >>>> Sent from my iPhone >> >>>> >> >>>> On Nov 28, 2012, at 8:44 PM, Harsh J <ha...@cloudera.com> wrote: >> >>>> >> >>>>> None of the current schedulers are "strict" in the sense of "do not >> >>>>> schedule the task if such a tasktracker is not available". That has >> >>>>> never been a requirement for Map/Reduce programs and nor should be. >> >>>>> >> >>>>> I feel if you want some code to run individually on all nodes for >> >>>>> whatever reason, you may as well ssh into each one and start it >> >>>>> manually with appropriate host-based parameters, etc.. and then >> >>>>> aggregate their results. >> >>>>> >> >>>>> Note that even if you get down to writing a scheduler for this >> >>>>> (which >> >>>>> I don't think is a good idea, but anyway), you ought to make sure >> your >> >>>>> scheduler also does non-strict scheduling of data local tasks for >> jobs >> >>>>> that don't require such strictness - in order for them to complete >> >>>>> quickly than wait around for scheduling in a fixed manner. >> >>>>> >> >>>>> On Thu, Nov 29, 2012 at 6:00 AM, Hiroyuki Yamada >> >>>>> <mogwa...@gmail.com >> > >> >>>>> wrote: >> >>>>>> Thank you all for the comments and advices. >> >>>>>> >> >>>>>> I know it is not recommended to assigning mapper locations by >> myself. >> >>>>>> But There needs to be one mapper running in each node in some >> >>>>>> cases, >> >>>>>> so I need a strict way to do it. >> >>>>>> >> >>>>>> So, locations is taken care of by JobTracker(scheduler), but it is >> not >> >>>>>> strict. >> >>>>>> And, the only way to do it strictly is making a own scheduler, >> >>>>>> right >> >>>>>> ? >> >>>>>> >> >>>>>> I have checked the source and I am not sure where to modify to do >> it. >> >>>>>> What I understand is FairScheduler and others are for scheduling >> >>>>>> multiple jobs. Is this right ? >> >>>>>> What I want to do is scheduling tasks in one job. >> >>>>>> This can be achieved by FairScheduler and others ? >> >>>>>> >> >>>>>> Regards, >> >>>>>> Hiroyuki >> >>>>>> >> >>>>>> On Thu, Nov 29, 2012 at 12:46 AM, Michael Segel >> >>>>>> <michael_se...@hotmail.com> wrote: >> >>>>>>> Mappers? Uhm... yes you can do it. >> >>>>>>> Yes it is non-trivial. >> >>>>>>> Yes, it is not recommended. >> >>>>>>> >> >>>>>>> I think we talk a bit about this in an InfoQ article written by >> >>>>>>> Boris >> >>>>>>> Lublinsky. >> >>>>>>> >> >>>>>>> Its kind of wild when your entire cluster map goes red in >> ganglia... >> >>>>>>> :-) >> >>>>>>> >> >>>>>>> >> >>>>>>> On Nov 28, 2012, at 2:41 AM, Harsh J <ha...@cloudera.com> wrote: >> >>>>>>> >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> Mapper scheduling is indeed influenced by the getLocations() >> >>>>>>> returned >> >>>>>>> results of the InputSplit. >> >>>>>>> >> >>>>>>> The map task itself does not care about deserializing the >> >>>>>>> location >> >>>>>>> information, as it is of no use to it. The location information >> >>>>>>> is >> >>>>>>> vital to >> >>>>>>> the scheduler (or in 0.20.2, the JobTracker), where it is sent to >> >>>>>>> directly >> >>>>>>> when a job is submitted. The locations are used pretty well here. >> >>>>>>> >> >>>>>>> You should be able to control (or rather, influence) mapper >> placement >> >>>>>>> by >> >>>>>>> working with the InputSplits, but not strictly so, cause in the >> >>>>>>> end >> >>>>>>> its up >> >>>>>>> to your MR scheduler to do data local or non data local >> assignments. >> >>>>>>> >> >>>>>>> >> >>>>>>> On Wed, Nov 28, 2012 at 11:39 AM, Hiroyuki Yamada >> >>>>>>> <mogwa...@gmail.com> >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> Hi Harsh, >> >>>>>>>> >> >>>>>>>> Thank you for the information. >> >>>>>>>> I understand the current circumstances. >> >>>>>>>> >> >>>>>>>> How about for mappers ? >> >>>>>>>> As far as I tested, location information in InputSplit is >> >>>>>>>> ignored >> >>>>>>>> in >> >>>>>>>> 0.20.2, >> >>>>>>>> so there seems no easy way for assigning mappers to specific >> nodes. >> >>>>>>>> (I before checked the source and noticed that >> >>>>>>>> location information is not restored when deserializing the >> >>>>>>>> InputSplit >> >>>>>>>> instance.) >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> Hiroyuki >> >>>>>>>> >> >>>>>>>> On Wed, Nov 28, 2012 at 2:08 PM, Harsh J <ha...@cloudera.com> >> >>>>>>>> wrote: >> >>>>>>>>> This is not supported/available currently even in MR2, but take >> >>>>>>>>> a >> >>>>>>>>> look >> >>>>>>>>> at >> >>>>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-199. >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada >> >>>>>>>>> <mogwa...@gmail.com> >> >>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> Hi, >> >>>>>>>>>> >> >>>>>>>>>> I am wondering how I can assign reduce tasks to specific >> >>>>>>>>>> nodes. >> >>>>>>>>>> What I want to do is, for example, assigning reducer which >> >>>>>>>>>> produces >> >>>>>>>>>> part-00000 to node xxx000, >> >>>>>>>>>> and part-00001 to node xxx001 and so on. >> >>>>>>>>>> >> >>>>>>>>>> I think it's abount task assignment scheduling but >> >>>>>>>>>> I am not sure where to customize to achieve this. >> >>>>>>>>>> Is this done by writing some extensions ? >> >>>>>>>>>> or any easier way to do this ? >> >>>>>>>>>> >> >>>>>>>>>> Regards, >> >>>>>>>>>> Hiroyuki >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> Harsh J >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Harsh J >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Harsh J >> >> >> >> >> >> >> >> -- >> >> Harsh J >> >> >> > >> > >> >