[ https://issues.apache.org/jira/browse/YARN-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suma Shivaprasad updated YARN-8710: ----------------------------------- Environment: (was: Container retries are currently set to a default of -1 in AbstractProviderService.buildContainerRetry. If this is not overridden via service spec with a finite value for yarn.service.container-failure.retry.max , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE restart policy . Ideally it should try a finite number of time on the same NM and subsequently Service AM can retry on another node. We can set this to default value of 3.) > Service AM should set a finite limit on NM container max retries > ----------------------------------------------------------------- > > Key: YARN-8710 > URL: https://issues.apache.org/jira/browse/YARN-8710 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Reporter: Suma Shivaprasad > Assignee: Suma Shivaprasad > Priority: Major > > Container retries are currently set to a default of -1 in > AbstractProviderService.buildContainerRetry. If this is not overridden via > service spec with a finite value for yarn.service.container-failure.retry.max > , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE > restart policy . Ideally it should try a finite number of time on the same NM > and subsequently Service AM can retry on another node. > We can set this to default value of 3. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org