Re: Non data-local scheduling

2013-10-03 Thread Sandy Ryza
Hi Andre, Try setting yarn.scheduler.capacity.node-locality-delay to a number between 0 and 1. This will turn on delay scheduling - here's the doc on how this works: For applications that request containers on particular nodes, the number of scheduling opportunities since the last container assi

Re: Non data-local scheduling

2013-10-03 Thread André Hacker
Thanks, but I can't set this to a fraction, it wants to see an integer. My documentation is slightly different: "Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. Typically this should be set to number of racks in the cluster, th

Re: Non data-local scheduling

2013-10-03 Thread Chris Mawata
Try playing with the block size vs split size. If the blocks are very large and the splits small then multiple splits correspond to the same block and if there are more splits than replicas you get rack local processing. On 10/3/2013 12:57 PM, André Hacker wrote: Hi, I have a 25 node cluster

Re: Non data-local scheduling

2013-10-03 Thread Sandy Ryza
Ah, I was going off the Fair Scheduler equivalent, didn't realize they were different. In that case you might try setting it to something like half the nodes in the cluster. Nodes are constantly heartbeating to the Resource Manager. When a node heartbeats, the scheduler checks to see whether the

RE: Non data-local scheduling

2013-10-05 Thread John Lilley
udera.com] Sent: Thursday, October 03, 2013 12:32 PM To: user@hadoop.apache.org Subject: Re: Non data-local scheduling Ah, I was going off the Fair Scheduler equivalent, didn't realize they were different. In that case you might try setting it to something like half the nodes in the cluster.

Re: Non data-local scheduling

2013-10-07 Thread Arun C Murthy
It's cluster-wide setting and scheduler-specific. For CS please set yarn.scheduler.capacity.node-locality-delay to #machines you have in your rack (typically 20 or 40). Looks like the doc in capacity-scheduler.xml is broken, would you mind opening a jira to fix it and add it to the the http:/