As far as I know, the scheduler in YARN is only scheduling the jobs and not the containers inside each job. Therefore, I don't believe it is relevant. Also, I haven't used or set those two parameters, and I haven't picked nor set any particular schedule for my research (Fair, FIFO or Capacity). Please correct if I am wrong. P.S. currently I have no interest in a situation when I run a few jobs concurrently, my case is much simpler with one job that I would like that allocation of containers will be more balanced... Or
בתאריך יום ד׳, 9 בינו׳ 2019 ב-19:11 מאת Aaron Eng <a...@mapr.com>: > Have you checked the yarn.scheduler.fair.assignmultiple > and yarn.scheduler.fair.max.assign parameters for the ResourceManager > configuration? > > On Wed, Jan 9, 2019 at 9:49 AM Or Raz <r...@post.bgu.ac.il> wrote: > >> How can I change/suggest a different allocation of containers to tasks in >> Hadoop? Regarding a native Hadoop (2.9.1) cluster on AWS. >> >> I am running a native Hadoop cluster (2.9.1) on AWS (with EC2, not EMR) >> and I want the scheduling/allocating of the containers (Mappers/Reducers) >> would be more balanced than it is currently. It seems like RM is assigning >> the Mappers in a Bin Packing way (where the data resides) and for the >> reducers, it looks more balanced. My setup includes three Machines with >> replication rate three (all the data is on every machine), and I run my >> jobs with mapreduce.job.reduce.slowstart.completedmaps=0 to start shuffle >> as fast as possible (It is vital for me that all the containers are working >> in concurrency, it is a must condition). Also, according to the EC2 >> instances I have chosen and my settings of the YARN cluster, I can run at >> most 93 containers (31 each). >> >> For example, if I want to have nine reducers then (93-9-1=83), 83 >> containers could be left for the mappers, and one is for the AM. I have >> played with the size of split input >> (mapreduce.input.fileinputformat.split.minsize, >> mapreduce.input.fileinputformat.split.maxsize) to find the right balance >> where all of the machines have the same "work" for the map phase. But it >> seems like the first 31 mappers would be allocated in one computer, the >> next 31 to the second one and the last 31 in the last machine. Thus, I can >> try to use 87 mappers where 31 of them in Machine #1, another 31 in Machine >> #2 and another 25 in Machine #3 and the rest is left for the reducers and >> as Machine #1 and Machine #2 are fully occupied then the reducers would >> have to be placed in Machine #3. This way I get an almost balanced >> allocation of mappers at the expense of unbalanced reducers allocation. And >> this is not what I want... >> >> # of mappers = size_input / split size [Bytes] >> >> split size >> =max(mapreduce.input.fileinputformat.split.minsize,min(mapreduce.input.fileinputformat.split.maxsize, >> dfs.blocksize)) >> >