Marcelo Vanzin created SPARK-11131: -------------------------------------- Summary: Worker registration protocol is racy Key: SPARK-11131 URL: https://issues.apache.org/jira/browse/SPARK-11131 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Marcelo Vanzin Priority: Minor
I ran into this while making changes to the new RPC framework. Because the Worker registration protocol is based on sending unrelated messages between Master and Worker, it's possible that another message (e.g. caused by an a app trying to allocate workers) to arrive at the Worker before it knows the Master has registered it. This triggers the following code: {code} case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) => if (masterUrl != activeMasterUrl) { logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.") {code} This may or may not be made worse by SPARK-11098. A simple workaround is to use an {{ask}} instead of a {{send}} for these messages. That should at least narrow the race. Note this is more of a problem in {{local-cluster}} mode, used a lot by unit tests, where Master and Worker instances are coming up as part of the app itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org