Hi, We use Hadoop 2.8.5 on EMR for a MapReduce job that reads data from S3. The job has 13K mappers, and the cluster is 200 r5.xlarge machines.
The cluster is _extremely_ under utilized. We've went through all the possible configuration values that can cause this problem and everything is fine. No failing jobs, no decommissioned nodes. The log of the Resource Manager is full of these messages: 2019-06-12 08:43:05,307 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Trying to fulfill reservation for application application_1560249941099_0004 on node: ip-XX-XXX-XX-XXX.us-west-2.compute.internal:8041 2019-06-12 08:43:05,308 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler (ResourceManager Event Processor): Skipping scheduling since node ip-XX-XXX-XX-XXX.us-west-2.compute.internal:8041 is reserved by application appattempt_1560249941099_0004_000001 And I am suspecting that this is the cause. This message will appear for every node in the cluster. Can you please help us figure this out? Thanks