Hi there:      I got an problem that "Application has been killed.Reason:All 
masters are unresponsive!Giving up." I check the network I/O and found 
sometimes it is really high when running my app. Pls refer to the attached pic 
for more info.I also checked 
http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html,
 and set SPARK_LOCAL_IP in every node's spark-env.sh of my spark cluster. 
However it does not benifit in solving this problem.I am not sure if this 
parameter is correctly set,my setting is like this:On node1:export 
SPARK_LOCAL_IP={node1's IP}On node2:export SPARK_LOCAL_IP={node2's IP}......

BTW,I guess that the akka will retry 3 times when communicate between master 
and slave, it is possible to increase the akka retries?

And except expand the network bandwidth, is there another way to solve this 
problem?
thanks for any coming ideas.

--------------------------------

 

Thanks&Best regards!
San.Luo
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to