We have 4 nodes and 4 large (~30GB each tasks), additionally we have about 25 small (~2 GB each) tasks. All tasks can possibly be started in random order. On each node we have 50GB for yarn. So in case we start all 4 large tasks at the beginning the are correctly scheduled to all 4 nodes. But in case we first start all short tasks they all go to the first cluster node and there is no free capacity on it. Then we try to start 4 large tasks but we only have resources from remaining 3 nodes available and cannot start one of the large tasks.
BR, Rafal. 2016-11-10 9:54 GMT+01:00 Bibinchundatt <bibin.chund...@huawei.com>: > Hi Rafal! > > Is there a way to force yarn to use configured above thresholds (70% and > 30%) per node? > > -Currently we can’t specify threshold per node. > > > > As per your initial mail Yarn per node is ~50GB means all nodes resources > are same. Any usecase specifically for per node allocation based on > percentage? > > > > > > *From:* Rafał Radecki [mailto:radecki.ra...@gmail.com] > *Sent:* 10 November 2016 14:59 > *To:* Ravi Prakash > *Cc:* user > *Subject:* Re: Yarn 2.7.3 - capacity scheduler container allocation to > nodes? > > > > Hi Ravi. > > > > I did not specify labels this time ;) I just created two queues as it is > visible in the configuration. > > Overall queues work but allocation of jobs is different then expected by > me as I wrote at the beginning. > > > > BR, > > Rafal. > > > > 2016-11-10 2:48 GMT+01:00 Ravi Prakash <ravihad...@gmail.com>: > > Hi Rafal! > > Have you been able to launch the job successfully first without > configuring node-labels? Do you really need node-labels? How much total > memory do you have on the cluster? Node labels are usually for specifying > special capabilities of the nodes (e.g. some nodes could have GPUs and your > application could request to be run on only the nodes which have GPUs) > > HTH > > Ravi > > > > On Wed, Nov 9, 2016 at 5:37 AM, Rafał Radecki <radecki.ra...@gmail.com> > wrote: > > Hi All. > > > > I have a 4 node cluster on which I run yarn. I created 2 queues "long" and > "short", first with 70% resource allocation, the second with 30% > allocation. Both queues are configured on all available nodes by default. > > > > My memory for yarn per node is ~50GB. Initially I thought that when I will > run tasks in "short" queue yarn will allocate them on all nodes using 30% > of the memory on every node. So for example if I run 20 tasks, 2GB each > (40GB summary), in short queue: > > - ~7 first will be scheduled on node1 (14GB total, 30% out of 50GB > available on this node for "short" queue -> 15GB) > - next ~7 tasks will be scheduled on node2 > > - ~6 remaining tasks will be scheduled on node3 > > - yarn on node4 will not use any resources assigned to "short" queue. > > But this seems not to be the case. At the moment I see that all tasks are > started on node1 and other nodes have no tasks started. > > > > I attached my yarn-site.xml and capacity-scheduler.xml. > > > > Is there a way to force yarn to use configured above thresholds (70% and > 30%) per node and not per cluster as a whole? I would like to get a > configuration in which on every node 70% is always available for "short" > queue, 70% for "long" queue and in case any resources are free for a > particular queue they are not used by other queues. Is it possible? > > > > BR, > > Rafal. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org > > > > >