Unable to stop containers when keep-containers-across-application-attempt is enabled

Bharath Kumara Subramanian Mon, 30 Nov 2020 23:45:39 -0800

Hi,

I am currently trying to keep the containers from the previous attempts
alive across attempts so that when AM restarts happen, the processing
containers stay intact.


I am achieving this using the keep-containers-across-application-attempt
flag. For my use case, I do need to stop the processing container from the
previous attempt in case of certain metadata changes (e.g. work
assignments).

When the new AM tries to stop the existing container, the NMClientAsync
throws the following exception

org.apache.hadoop.yarn.exceptions.YarnException: Container
> container_1606797336059_0004_01_000002 is neither started nor scheduled to
> start
> at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
> ~[hadoop-yarn-common-2.7.1.jar:?]
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl.stopContainerAsync(NMClientAsyncImpl.java:235)
> ~[hadoop-yarn-client-2.7.1.jar:?
>

I am guessing the NMClient is unaware of this container since it didn't
start it in the first place. I tried fetching the status through the
NMClient which is successful and returns running.

My guess is the list of containers that NMClient tracks doesn't have the
containers that belonged to previous attempts and hence there is no way to
stop it.

Any help is appreciated.

Thanks,
Bharath

Unable to stop containers when keep-containers-across-application-attempt is enabled

Reply via email to