Satish Subhashrao Saley created OOZIE-2650: ----------------------------------------------
Summary: Retry coord start on database exceptions Key: OOZIE-2650 URL: https://issues.apache.org/jira/browse/OOZIE-2650 Project: Oozie Issue Type: Bug Reporter: Satish Subhashrao Saley {code:title=CoordActionStartXCommand.java} updateList.add(new UpdateEntry<WorkflowJobQuery>( WorkflowJobQuery.UPDATE_WORKFLOW_PARENT_MODIFIED, wfJob)); updateList.add(new UpdateEntry<CoordActionQuery>( CoordActionQuery.UPDATE_COORD_ACTION_FOR_START, coordAction)); try { executor.executeBatchInsertUpdateDelete(insertList, updateList, null); queue(new CoordActionNotificationXCommand(coordAction), 100); if (EventHandlerService.isEnabled()) { generateEvent(coordAction, user, appName, wfJob.getStartTime()); } } catch (JPAExecutorException je) { throw new CommandException(je); } ......... ....... ........ finally { if (makeFail == true) { // No DB exception occurs .... .... .... queue(new CoordActionReadyXCommand(coordAction.getJobId())); } } {code} If there is any Database issue while starting coord action, we fail the coord action. We should retry. CoordActionStartXCommand submits workflow. Workflow gets linked to the coord action if workflow submission succeeds. But if coord action update fails due to database exception, recovery service should be able to recover it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)