Currently the number of retries is hardcoded. You may want to open a JIRA which makes the retry count configurable.
Cheers On Thu, Jul 2, 2015 at 8:35 PM, <luohui20...@sina.com> wrote: > Hi there, > > i check the source code and found that in > org.apache.spark.deploy.client.AppClient, there is a parameter tells(line > 52): > > val REGISTRATION_TIMEOUT = 20.seconds > > val REGISTRATION_RETRIES = 3 > > As I know If I wanna increase the retry times, must I modify this > value,rebuild the entire Spark project and then redeply spark cluster with > my modified version? > > Or is there a better way to solve this issue? > > Thanks. > > > > > -------------------------------- > > Thanks&Best regards! > San.Luo > > ----- 原始邮件 ----- > 发件人:<luohui20...@sina.com> > 收件人:"user" <user@spark.apache.org> > 主题:All master are unreponsive issue > 日期:2015年07月02日 17点31分 > > Hi there: > > I got an problem that "Application has been killed.Reason:All > masters are unresponsive!Giving up." I check the network I/O and found > sometimes it is really high when running my app. Pls refer to the attached > pic for more info. > > I also checked > http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html, > and set SPARK_LOCAL_IP in every node's spark-env.sh of my spark cluster. > However it does not benifit in solving this problem. > > I am not sure if this parameter is correctly set,my setting is like this: > > On node1: > > export SPARK_LOCAL_IP={node1's IP} > > On node2: > > export SPARK_LOCAL_IP={node2's IP} > > ...... > > > > BTW,I guess that the akka will retry 3 times when communicate between > master and slave, it is possible to increase the akka retries? > > > And except expand the network bandwidth, is there another way to solve > this problem? > > > thanks for any coming ideas. > > -------------------------------- > > Thanks&Best regards! > San.Luo >