Tim Hughes created SPARK-34349:
----------------------------------

             Summary: No python3 in docker images 
                 Key: SPARK-34349
                 URL: https://issues.apache.org/jira/browse/SPARK-34349
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 3.0.1
            Reporter: Tim Hughes


The spark-py container image doesn't receive the instruction to use python3 and 
defaults to python 2.7

 

The worker container was build using the following commands
{code:java}
mkdir ./tmp
wget -qO- 
https://www.mirrorservice.org/sites/ftp.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz
 | tar -C ./tmp/ -xzf -
cd ../spark-3.0.1-bin-hadoop3.2/
./bin/docker-image-tool.sh -r docker.io/timhughes -t spark-3.0.1-bin-hadoop3.2 
-p kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
docker push docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2{code}
 

This is the code I am using to initialize the workers

 
{code:java}
import os
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession# Create Spark config for our Kubernetes 
based cluster manager
sparkConf = SparkConf()
sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443";)
sparkConf.setAppName("spark")
sparkConf.set("spark.kubernetes.container.image", 
"docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2")
sparkConf.set("spark.kubernetes.namespace", "spark")
sparkConf.set("spark.executor.instances", "2")
sparkConf.set("spark.executor.cores", "1")
sparkConf.set("spark.driver.memory", "1024m")
sparkConf.set("spark.executor.memory", "1024m")
sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3")
sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", 
"spark")
sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark")
sparkConf.set("spark.driver.port", "29413")
sparkConf.set("spark.driver.host", 
"my-notebook-deployment.spark.svc.cluster.local")
# Initialize our Spark cluster, this will actually
# generate the worker nodes.
spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
sc = spark.sparkContext
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to