[ 
https://issues.apache.org/jira/browse/YARN-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Zhu updated YARN-9549:
---------------------------
    Attachment: yarn-site.xml

> Not able to run pyspark in docker driver container on Yarn3
> -----------------------------------------------------------
>
>                 Key: YARN-9549
>                 URL: https://issues.apache.org/jira/browse/YARN-9549
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.1.2
>         Environment: Hadoop 3.1.1.3.1.0.0-78
> spark version 2.3.2.3.1.0.0-78
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_211
> Server: Docker Engine - Community Version:          18.09.6
>            Reporter: Jack Zhu
>            Priority: Critical
>         Attachments: Dockerfile, test.py, yarn-site.xml
>
>
> I follow 
> [https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/DockerContainers.html]
>  to build up a spark docker image to run pyspark, there isn't a good document 
> describe how to use spark-submit pyspark job to a hadoop3 cluster, so I use 
> below command to launch my simple python job:
> PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.5 spark-submit --master yarn 
> --deploy-mode cluster --num-executors 3 --executor-memory 1g --conf 
> spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf 
> spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8 
> --conf 
> spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1.0.8
>  --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker ./test.py
>  
> in the test.py, it only simply collect the hostname from the executor, and 
> check whether the python job run in a container or not.
> I found that the driver always run direct on the host, not run in the 
> container, as a result we need to keep python version in docker image 
> consistent with the nodemanager, this is meanless to use docker to package 
> all the dependencies.
>  
> The spark job can be run successfully, below is the std output:
> Log Type: stdout
> Log Upload Time: Tue May 14 02:07:06 +0000 2019
> Log Length: 141
> host.test.com
> False ============>going to print all the container names. [True, True, True, 
> True, True, True, True, True, True]
> please see attached Dockfile and test.py
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to