
Have you set python environment variables correctly?


You can print the environment variables within your PySpark script to
verify this:

import os
print("PYTHONPATH:", os.environ.get("PYTHONPATH"))
print("PYSPARK_PYTHON:", os.environ.get("PYSPARK_PYTHON"))

You can set this in your .bashrc or $SPARK_HOME/confspark-env.sh


Mich Talebzadeh,
Solutions Architect & Engineer
United Kingdom

Disclaimer: Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
United Kingdom

   view my Linkedin profile


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Tue, 5 Sept 2023 at 06:12, Harry Jamison
<harryjamiso...@yahoo.com.invalid> wrote:

> That did not paste well, let me try again
> I am using python3.7 and spark 2.4.7
> I am trying to figure out why my job is using the wrong python version
> This is how it is starting up the logs confirm that I am using python 3.7
> But I later see the error message showing it is trying to us 3.8, and I am
> not sure where it is picking that up.
> SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
> Here is my command
> sudo --preserve-env -u spark pyspark --deploy-mode client --jars
> /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-
> --verbose --py-files pullhttp/base_http_pull.py --master yarn
> Python 3.7.17 (default, Jun  6 2023, 20:10:10)
> [GCC 9.4.0] on linux
> And when I try to run als.fit on my training data I get this
> >>> model = als.fit(training)
> [Stage 0:>                                                          (0 +
> 1) / 1]23/09/04 21:42:10 WARN scheduler.TaskSetManager: Lost task 0.0 in
> stage 0.0 (TID 0, datanode1, executor 2): org.apache.spark.SparkException:
> Error from python worker:
>   Traceback (most recent call last):
>     File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main
>       mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
>     File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details
>       __import__(pkg_name)
>     File "<frozen importlib._bootstrap>", line 991, in _find_and_load
>     File "<frozen importlib._bootstrap>", line 975, in
> _find_and_load_unlocked
>     File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
>     File "<frozen importlib._bootstrap>", line 618, in
> _load_backward_compatible
>     File "<frozen zipimport>", line 259, in load_module
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py",
> line 51, in <module>
> ....
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
> line 145, in <module>
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
> line 126, in _make_cell_set_template_code
>   TypeError: an integer is required (got type bytes)
> /yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.zip
> org.apache.spark.SparkException: No port number in pyspark.daemon's stdout
> On Monday, September 4, 2023 at 10:08:56 PM PDT, Harry Jamison
> <harryjamiso...@yahoo.com.invalid> wrote:
> I am using python3.7 and spark 2.4.7
> I am trying to figure out why my job is using the wrong python version
> This is how it is starting up the logs confirm that I am using python 3.7
> But I later see the error message showing it is trying to us 3.8, and I am
> not sure where it is picking that up.
> SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
> Here is my command
> sudo --preserve-env -u spark pyspark --deploy-mode client --jars
> /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-
> --verbose --py-files pullhttp/base_http_pull.py --master yarn
> Python 3.7.17 (default, Jun  6 2023, 20:10:10)
> [GCC 9.4.0] on linux
> And when I try to run als.fit on my training data I get this
> >>> model = als.fit(training)
> [Stage 0:>                                                          (0 +
> 1) / 1]23/09/04 21:42:10 WARN scheduler.TaskSetManager: Lost task 0.0 in
> stage 0.0 (TID 0, datanode1, executor 2): org.apache.spark.SparkException:
> Error from python worker:
>   Traceback (most recent call last):
>     File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main
>       mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
>     File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details
>       __import__(pkg_name)
>     File "<frozen importlib._bootstrap>", line 991, in _find_and_load
>     File "<frozen importlib._bootstrap>", line 975, in
> _find_and_load_unlocked
>     File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
>     File "<frozen importlib._bootstrap>", line 618, in
> _load_backward_compatible
>     File "<frozen zipimport>", line 259, in load_module
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py",
> line 51, in <module>
> ....
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
> line 145, in <module>
>     File
> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
> line 126, in _make_cell_set_template_code
>   TypeError: an integer is required (got type bytes)
> /yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.zip
> org.apache.spark.SparkException: No port number in pyspark.daemon's stdout

Reply via email to