Benoy Antony created ZEPPELIN-2355: -------------------------------------- Summary: Fix race conditions while cancelling a paragraph Key: ZEPPELIN-2355 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2355 Project: Zeppelin Issue Type: Bug Reporter: Benoy Antony Assignee: Benoy Antony
I experienced a few issues while testing the cancel functionality for a Livy paragraph. The tests were performed on a real yarn cluster with Livy running in cluster modes. On a real cluster, it takes some time to launch application and start executing the paragraphs. The current cancel function has a few concurrency issues. The visible issue was that the user will keep on cancelling initially, but paragraph will run to the finish. {code} @Override public void cancel(InterpreterContext context) { if (livyVersion.isCancelSupported()) { String paraId = context.getParagraphId(); Integer stmtId = paragraphId2StmtIdMap.get(paraId); try { if (stmtId != null) { cancelStatement(stmtId); } } catch (LivyException e) { LOGGER.error("Fail to cancel statement " + stmtId + " for paragraph " + paraId, e); } finally { paragraphId2StmtIdMap.remove(paraId); } } else { LOGGER.warn("cancel is not supported for this version of livy: " + livyVersion); } } {code} Issue 1 : The variable livyVersion is set in initLivySession(). The thread executing cancel may not see the value and hence throw NullPointerException. Issue 2 : The cancel is a no-op if the statement id is not available. A significant time (may be up to a minute) may pass before the statement id is available. The user need to keep cancelling till the statement id is available. There is no real way for the user to identify when the statement id is available. Issue 3: THE SQL paragraph cannot be cancelled. This can be fixed by changing LivySparkSQLInterpreter to invoke cancel on the underlying LivySparkInterpreter -- This message was sent by Atlassian JIRA (v6.3.15#6346)