[ https://issues.apache.org/jira/browse/GOBBLIN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chen Guo resolved GOBBLIN-998. ------------------------------ Resolution: Fixed > ExecutionStatus should be reset to PENDING before a job retries > --------------------------------------------------------------- > > Key: GOBBLIN-998 > URL: https://issues.apache.org/jira/browse/GOBBLIN-998 > Project: Apache Gobblin > Issue Type: Bug > Reporter: Chen Guo > Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > In the modifyStateIfRetryRequired of KafkaJobStatusMonitor, when the state is > Failed and currentAttempts < maxAttempts, the ExecutionStatus is set to > Running. > However, due to the checkin from > GOBBLIN-974([https://github.com/apache/incubator-gobblin/blob/9f50a2563cc257039da44018663b6b9e119fb499/gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java#L159]), > the currentAttempts update from a lower-order event(like Orchestrated) > cannot be consumed to update the jobState file. Thus it will cause infinite > retries in DagManagerThread for failed jobs when it poolAndAdvanceDag. > > The solution is to update ExecutionStatus to PENDING instead of Running. -- This message was sent by Atlassian Jira (v8.3.4#803005)