I am having a heck of a time setting up my development environment. I used
pip to install pyspark. I also downloaded spark from apache.
My eclipse pyDev intereperter is configured as a python3 virtualenv
I have a simple unit test that loads a small dataframe. Df.show() generates
the following error
2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0
(TID 0)
org.apache.spark.SparkException:
Error from python worker:
Traceback (most recent call last):
File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py",
line 67, in <module>
import os
File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py",
line 409
yield from walk(new_path, topdown, onerror, followlinks)
^
SyntaxError: invalid syntax
My unittest classs is dervied from.
class PySparkTestCase(unittest.TestCase):
@classmethod
def setUpClass(cls):
conf = SparkConf().setMaster("local[2]") \
.setAppName(cls.__name__) #\
# .set("spark.authenticate.secret", "111111")
cls.sparkContext = SparkContext(conf=conf)
sc_values[cls.__name__] = cls.sparkContext
cls.sqlContext = SQLContext(cls.sparkContext)
print("aedwip:", SparkContext)
@classmethod
def tearDownClass(cls):
print("....calling stop tearDownClas, the content of sc_values=",
sc_values)
sc_values.clear()
cls.sparkContext.stop()
This looks similar to Class PySparkTestCase in
https://github.com/apache/spark/blob/master/python/pyspark/tests.py
Any suggestions would be greatly appreciated.
Andy
My downloaed version is spark-2.3.0-bin-hadoop2.7
My virtual env version is
(spark-2.3.0) $ pip show pySpark
Name: pyspark
Version: 2.3.0
Summary: Apache Spark Python API
Home-page: https://github.com/apache/spark/tree/master/python
Author: Spark Developers
Author-email: [email protected]
License: http://www.apache.org/licenses/LICENSE-2.0
Location:
/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site-packages
Requires: py4j
(spark-2.3.0) $
(spark-2.3.0) $ python --version
Python 3.6.1
(spark-2.3.0) $