I am using python3.7 and spark 2.4.7 I am trying to figure out why my job is using the wrong python version This is how it is starting up the logs confirm that I am using python 3.7But I later see the error message showing it is trying to us 3.8, and I am not sure where it is picking that up.
SPARK_HOME = /usr/local/lib/python3.7/dist-packages/pyspark Here is my command sudo --preserve-env -u spark pyspark --deploy-mode client --jars /opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/jars/phoenix5-spark-shaded-6.0.0.7.1.7.0-551.jar --verbose --py-files pullhttp/base_http_pull.py --master yarn Python 3.7.17 (default, Jun 6 2023, 20:10:10) [GCC 9.4.0] on linux And when I try to run als.fit on my training data I get this >>> model = als.fit(training)[Stage 0:> >>> (0 + 1) / 1]23/09/04 21:42:10 WARN >>> scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, datanode1, >>> executor 2): org.apache.spark.SparkException: Error from python worker: >>> Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", >>> line 185, in _run_module_as_main mod_name, mod_spec, code = >>> _get_module_details(mod_name, _Error) File >>> "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details >>> __import__(pkg_name) File "<frozen importlib._bootstrap>", line 991, in >>> _find_and_load File "<frozen importlib._bootstrap>", line 975, in >>> _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, >>> in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in >>> _load_backward_compatible File "<frozen zipimport>", line 259, in >>> load_module File >>> "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/__init__.py", >>> line 51, in <module> .... File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module> File "/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code TypeError: an integer is required (got type bytes)PYTHONPATH was: /yarn/nm/usercache/spark/filecache/1130/__spark_libs__3536427065776590449.zip/spark-core_2.11-2.4.7.jar:/usr/local/lib/python3.7/dist-packages/pyspark/python/lib/py4j-0.10.7-src.zip:/usr/local/lib/python3.7/dist-packages/pyspark/python/::/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/__pyfiles__:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/pyspark.zip:/yarn/nm/usercache/spark/appcache/application_1693107150164_0198/container_e03_1693107150164_0198_01_000003/py4j-0.10.7-src.ziporg.apache.spark.SparkException: No port number in pyspark.daemon's stdout