I have python jupyter notebook setup to create a spark context by default, and sometimes these fail with the following error:
18/04/30 18:03:27 WARN Utils: Service 'sparkDriver' could not bind on port 0. Attempting port 1.
18/04/30 18:03:27 ERROR SparkContext: Error initializing SparkContext.
java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
I have tracked it down to two possible settings that may cause this in spark 2.0.2, client mode, standalone cluster setup, running in kubernetes:
spark.driver.port - we don't set it, so it should be random
spark.ui.port - we set spark.ui.enabled=false so it should not try to bind to this port.
Short story is I do not know which one spark gets confused about, and looking at spark code not clear how spark.ui.port would cause this even if the error message lists it as a possible cause.
Question 1: have you seen this before?
Question 2: how do I trace the spark driver process? It seems that I can only set the sc.logLevel after the spark context is created, but I need to trace before the spark context is created.
I created a log4j.properties file in the spark/conf directory and set it to TRACE but that only gets picked up when I run a Scala jupyter notebook, not when I run a python juypyter notebook, and I haven't been able to find out how to turn the same level of tracing for a spark-driver process started via a python jupyter notebook.
Some things I looked at:
`SPARK_PRINT_LAUNCH_COMMAND=1 /usr/local/spark-2.0.2-bin-hadoop2.7/bin/pyspark`
Spark Command: python2.7
========================================
Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Spark Command: **/usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /usr/local/spark/conf/**:/usr/local/spark/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --name PySparkShell pyspark-shell
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND |
Regards,
Mihai Iacob DSX Local - Security, IBM Analytics |
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org