Re: Allocation of containers to tasks in Hadoop

Or Raz Wed, 09 Jan 2019 13:19:53 -0800

As far as I know, the scheduler in YARN is only scheduling the jobs and not
the containers inside each job. Therefore, I don't believe it is relevant.
Also, I haven't used or set those two parameters, and I haven't picked nor
set any particular schedule for my research (Fair, FIFO or Capacity).
Please correct if I am wrong.
P.S. currently I have no interest in a situation when I run a few jobs
concurrently, my case is much simpler with one job that I would like that
allocation of containers will be more balanced...
Or



‫בתאריך יום ד׳, 9 בינו׳ 2019 ב-19:11 מאת ‪Aaron Eng‬‏ <‪a...@mapr.com‬‏>:‬

> Have you checked the yarn.scheduler.fair.assignmultiple
> and yarn.scheduler.fair.max.assign parameters for the ResourceManager
> configuration?
>
> On Wed, Jan 9, 2019 at 9:49 AM Or Raz <r...@post.bgu.ac.il> wrote:
>
>> How can I change/suggest a different allocation of containers to tasks in
>> Hadoop? Regarding a native Hadoop (2.9.1) cluster on AWS.
>>
>> I am running a native Hadoop cluster (2.9.1) on AWS (with EC2, not EMR)
>> and I want the scheduling/allocating of the containers (Mappers/Reducers)
>> would be more balanced than it is currently. It seems like RM is assigning
>> the Mappers in a Bin Packing way (where the data resides) and for the
>> reducers, it looks more balanced. My setup includes three Machines with
>> replication rate three (all the data is on every machine), and I run my
>> jobs with mapreduce.job.reduce.slowstart.completedmaps=0 to start shuffle
>> as fast as possible (It is vital for me that all the containers are working
>> in concurrency, it is a must condition). Also, according to the EC2
>> instances I have chosen and my settings of the YARN cluster, I can run at
>> most 93 containers (31 each).
>>
>> For example, if I want to have nine reducers then (93-9-1=83), 83
>> containers could be left for the mappers, and one is for the AM. I have
>> played with the size of split input
>> (mapreduce.input.fileinputformat.split.minsize,
>> mapreduce.input.fileinputformat.split.maxsize) to find the right balance
>> where all of the machines have the same "work" for the map phase. But it
>> seems like the first 31 mappers would be allocated in one computer, the
>> next 31 to the second one and the last 31 in the last machine. Thus, I can
>> try to use 87 mappers where 31 of them in Machine #1, another 31 in Machine
>> #2 and another 25 in Machine #3 and the rest is left for the reducers and
>> as Machine #1 and Machine #2 are fully occupied then the reducers would
>> have to be placed in Machine #3. This way I get an almost balanced
>> allocation of mappers at the expense of unbalanced reducers allocation. And
>> this is not what I want...
>>
>> # of mappers = size_input / split size [Bytes]
>>
>> split size
>> =max(mapreduce.input.fileinputformat.split.minsize,min(mapreduce.input.fileinputformat.split.maxsize,
>> dfs.blocksize))
>>
>

Re: Allocation of containers to tasks in Hadoop

Reply via email to