[ https://issues.apache.org/jira/browse/YARN-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jack Zhu updated YARN-9549: --------------------------- Attachment: yarn-site.xml > Not able to run pyspark in docker driver container on Yarn3 > ----------------------------------------------------------- > > Key: YARN-9549 > URL: https://issues.apache.org/jira/browse/YARN-9549 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Affects Versions: 3.1.2 > Environment: Hadoop 3.1.1.3.1.0.0-78 > spark version 2.3.2.3.1.0.0-78 > Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211 > Server: Docker Engine - Community Version: 18.09.6 > Reporter: Jack Zhu > Priority: Critical > Attachments: Dockerfile, test.py, yarn-site.xml > > > I follow > [https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html] > to build up a spark docker image to run pyspark, there isn't a good document > describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use > below command to launch my simple python job: > PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn > --deploy-mode cluster --num-executors 3 --executor-memory 1g --conf > spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf > spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 > --conf > spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 > --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py > > in the test.py, it only simply collect the hostname from the executor, and > check whether the python job run in a container or not. > I found that the driver always run direct on the host, not run in the > container, as a result we need to keep python version in docker image > consistent with the nodemanager, this is meanless to use docker to package > all the dependencies. > > The spark job can be run successfully, below is the std output: > Log Type: stdout > Log Upload Time: Tue May 14 02:07:06 +0000 2019 > Log Length: 141 > host.test.com > False ============>going to print all the container names. [True, True, True, > True, True, True, True, True, True] > please see attached Dockfile and test.py > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org