I have some Python code that consistently ends up in this state: ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server Traceback (most recent call last): File "/home/ubuntu/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start self.socket.connect((self.address, self.port)) File "/usr/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 111] Connection refused ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server Traceback (most recent call last): File "/home/ubuntu/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start self.socket.connect((self.address, self.port)) File "/usr/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 111] Connection refused Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/home/ubuntu/spark/python/pyspark/sql/dataframe.py", line 280, in collect port = self._jdf.collectToPython() File "/home/ubuntu/spark/python/pyspark/traceback_utils.py", line 78, in __exit__ self._context._jsc.setCallSite(None) File "/home/ubuntu/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 811, in __call__ File "/home/ubuntu/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 624, in send_command File "/home/ubuntu/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 579, in _get_connection File "/home/ubuntu/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 585, in _create_connection File "/home/ubuntu/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 697, in start py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server
Even though I start pyspark with these options: ./pyspark --master local[4] --executor-memory 14g --driver-memory 14g --packages com.databricks:spark-csv_2.11:1.4.0 --spark.deploy.recoveryMode=FILESYSTEM and this in my /conf/spark-env.sh file: - SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=/user/recovery" How can I get HA to work in Spark? thanks, imran