Re: pyspark.ml.recommendation is using the wrong python version

Harry Jamison Mon, 04 Sep 2023 22:12:24 -0700

 That did not paste well, let me try again

I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I 
later see the error message showing it is trying to us 3.8, and I am not sure 
where it is picking that up.


SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my commandsudo --preserve-env -u spark pyspark --deploy-mode client 
--jars 
/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
  --verbose --py-files pullhttp/base_http_pull.py --master yarn
Python 3.7.17 (default, Jun  6 2023, 20:10:10) 
[GCC 9.4.0] on linux


And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>                                         
>>>                  (0 + 1) / 1]23/09/04 21:42:10 WARN 
>>> scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1, 
>>> executor 2): org.apache.spark.SparkException: Error from python worker:  
>>> Traceback (most recent call last):    File "/usr/lib/python3.8/runpy.py", 
>>> line 185, in _run_module_as_main      mod_name, mod_spec, code = 
>>> _get_module_details(mod_name, _Error)    File 
>>> "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details      
>>> __import__(pkg_name)    File "<frozen importlib._bootstrap>", line 991, in 
>>> _find_and_load    File "<frozen importlib._bootstrap>", line 975, in 
>>> _find_and_load_unlocked    File "<frozen importlib._bootstrap>", line 655, 
>>> in _load_unlocked    File "<frozen importlib._bootstrap>", line 618, in 
>>> _load_backward_compatible    File "<frozen zipimport>", line 259, in 
>>> load_module    File 
>>> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py",
>>>  line 51, in <module>

....

    File 
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
 line 145, in <module>    File 
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
 line 126, in _make_cell_set_template_code  TypeError: an integer is required 
(got type bytes)PYTHONPATH was:  
/yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.ziporg.apache.spark.SparkException:
 No port number in pyspark.daemon's stdout

    On Monday, September 4, 2023 at 10:08:56 PM PDT, Harry Jamison 
<harryjamiso...@yahoo.com.invalid> wrote:  
 
 I am using python3.7 and spark 2.4.7
I am trying to figure out why my job is using the wrong python version
This is how it is starting up the logs confirm that I am using python 3.7But I 
later see the error message showing it is trying to us 3.8, and I am not sure 
where it is picking that up.

SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark
Here is my command
sudo --preserve-env -u spark pyspark --deploy-mode client --jars 
/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar
  --verbose --py-files pullhttp/base_http_pull.py --master yarn

Python 3.7.17 (default, Jun  6 2023, 20:10:10) 


[GCC 9.4.0] on linux


And when I try to run als.fit on my training data I get this
>>> model = als.fit(training)[Stage 0:>                                         
>>>                  (0 + 1) / 1]23/09/04 21:42:10 WARN 
>>> scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1, 
>>> executor 2): org.apache.spark.SparkException: Error from python worker:  
>>> Traceback (most recent call last):    File "/usr/lib/python3.8/runpy.py", 
>>> line 185, in _run_module_as_main      mod_name, mod_spec, code = 
>>> _get_module_details(mod_name, _Error)    File 
>>> "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details      
>>> __import__(pkg_name)    File "<frozen importlib._bootstrap>", line 991, in 
>>> _find_and_load    File "<frozen importlib._bootstrap>", line 975, in 
>>> _find_and_load_unlocked    File "<frozen importlib._bootstrap>", line 655, 
>>> in _load_unlocked    File "<frozen importlib._bootstrap>", line 618, in 
>>> _load_backward_compatible    File "<frozen zipimport>", line 259, in 
>>> load_module    File 
>>> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py",
>>>  line 51, in <module>

....

    File 
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
 line 145, in <module>    File 
"/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py",
 line 126, in _make_cell_set_template_code  TypeError: an integer is required 
(got type bytes)PYTHONPATH was:  
/yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.ziporg.apache.spark.SparkException:
 No port number in pyspark.daemon's stdout

Re: pyspark.ml.recommendation is using the wrong python version

Reply via email to