Looks like PySpark can't initiate a JVM in the backend. How did you set
up Java and Spark on your machine? Some suggestions that may help solve
your issue:
1. Use OpenJDK instead of Apple JDK since Spark was developed using
OpenJDK, not Apple's. You can use homebrew to install OpenJDK (I
don't see any reasons why you need to use Apple's JDK unless you are
using the latest Mac. See question below)
2. Download and deploy the Spark tarball directly from Spark's web site
and run Spark's examples to test your environment using command line
before integrating with PyCharm
My question to the group: Does anyone have any luck with Apple's JDK
when running Spark or other applications (performance-wise)? Is this the
one with native libs for the M1 chipset?
-- ND
On 8/17/21 1:56 AM, karan alang wrote:
Hello Experts,
i'm trying to run spark-submit on my macbook pro(commandline or using
PyCharm), and it seems to be giving error ->
Exception: Java gateway process exited before sending its port number
i've tried setting values to variable in the program (based on the
recommendations by people on the internet), but the problem still remains.
Any pointers on how to resolve this issue?
# explicitly setting environment variables
os.environ["JAVA_HOME"] =
"/Library/Java/JavaVirtualMachines/applejdk-11.0.7.10.1.jdk/Contents/Home"
os.environ["PYTHONPATH"] =
"/usr/local/Cellar/apache-spark/3.1.2/libexec//python/lib/py4j-0.10.4-src.zip:/usr/local/Cellar/apache-spark/3.1.2/libexec//python/:"
os.environ["PYSPARK_SUBMIT_ARGS"]="--master local[2] pyspark-shell"
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Applications/PyCharm
CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py",
line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) #
execute the script
File "/Applications/PyCharm
CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py",
line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File
"/Users/karanalang/Documents/Technology/StructuredStreamin_Udemy/Spark-Streaming-In-Python-master/00-HelloSparkSQL/HelloSparkSQL.py",
line 12, in <module>
spark = SparkSession.builder.master("local[*]").getOrCreate()
File
"/Users/karanalang/.conda/envs/PythonLeetcode/lib/python3.9/site-packages/pyspark/sql/session.py",
line 228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File
"/Users/karanalang/.conda/envs/PythonLeetcode/lib/python3.9/site-packages/pyspark/context.py",
line 384, in getOrCreate
SparkContext(conf=conf or SparkConf())
File
"/Users/karanalang/.conda/envs/PythonLeetcode/lib/python3.9/site-packages/pyspark/context.py",
line 144, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File
"/Users/karanalang/.conda/envs/PythonLeetcode/lib/python3.9/site-packages/pyspark/context.py",
line 331, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File
"/Users/karanalang/.conda/envs/PythonLeetcode/lib/python3.9/site-packages/pyspark/java_gateway.py",
line 108, in launch_gateway
raise Exception("Java gateway process exited before sending its
port number")
Exception: Java gateway process exited before sending its port number