Leak in RM Capacity scheduler leading to OOM

Sharad Agarwal Wed, 23 Mar 2016 05:21:08 -0700

Taking a dump of 8 GB heap shows about 18 million
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto


Similar counts are there for ApplicationAttempt, ContainerId. All seems to
be linked via
org.apache.hadoop.yarn.proto.YarnProtos$ContainerStatusProto, the count of
which is also about 18 million.

On further debugging, looking at the CapacityScheduler code:

It seems to add duplicated entries of UpdatedContainerInfo objects for the
completed containers. In the same dump seeing about 0.5
UpdatedContainerInfo million objects

This issue only surfaces if the scheduler thread is not able to drain fast
enough the UpdatedContainerInfo objects, happens only in a big cluster.

Has anyone noticed the same. We are running hadoop 2.6.0

Sharad

Leak in RM Capacity scheduler leading to OOM

Reply via email to