Hi,

We use Hadoop 2.8.5 on EMR for a MapReduce job that reads data from S3.
The job has 13K mappers, and the cluster is 200 r5.xlarge machines.

The cluster is _extremely_ under utilized. We've went through all the
possible configuration values that can cause this problem and everything is
fine.

No failing jobs, no decommissioned nodes.

The log of the Resource Manager is full of these messages:

2019-06-12 08:43:05,307 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Trying to fulfill reservation for
application application_1560249941099_0004 on node:
ip-XX-XXX-XX-XXX.us-west-2.compute.internal:8041
2019-06-12 08:43:05,308 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
(ResourceManager Event Processor): Skipping scheduling since node
ip-XX-XXX-XX-XXX.us-west-2.compute.internal:8041 is reserved by application
appattempt_1560249941099_0004_000001

And I am suspecting that this is the cause.

This message will appear for every node in the cluster.

Can you please help us figure this out?

Thanks

Reply via email to