[ https://issues.apache.org/jira/browse/SPARK-11874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557217#comment-15557217 ]
holdenk commented on SPARK-11874: --------------------------------- I think this is not intended to be supported, although I'm not super sure since the title of the JIRA seems a bit different than the text. Is it possible that some of the work around virtualenv support might meet your needs should it move forward? > DistributedCache for PySpark > ---------------------------- > > Key: SPARK-11874 > URL: https://issues.apache.org/jira/browse/SPARK-11874 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.4.1 > Reporter: Ranjana Rajendran > > I have access only to the workbench of a cluster. All the nodes have only > python 2.6. I want to use PySpark with iPython notebook with Python 2.7. > I created a python2.7 virtual environment as follows: > conda create -n py27 python=2.7 anaconda > source activate py27 > I installed all required modules in py27 . > Created a zip for the py27 virtual environment. > zip -r py27.zip py27 > hadoop fs -put py27.zip > Now > export PYSPARK_DRIVER_PYTHON=ipython > export PYSPARK_DRIVER_PYTHON_OPTS=notebook > export PYSPARK_PYTHON=./py27/bin/python > export > PYTHONPATH=/opt/spark/python/lib/py4j-0.8.2.1-src.zip:/opt/spark/python/:PYSPARK_DRIVER_PYTHON=ipython > I launched pyspark as follows: > /opt/spark/bin/pyspark --verbose --name iPythondemo --conf > spark.yarn.executor.memoryOverhead=2048 --conf > spark.eventLog.dir=${spark_event_log_dir}$USER/ --master yarn --deploy-mode > client --archives hdfs:///user/alti_ranjana/py27.zip#py27 --executor-memory > 8G --executor-cores 2 --queue default --num-executors 48 $spark_opts_extra > When I try to run a job in client mode, i.e. making use of executors running > on all the nodes, > I get error stating that file ./py27/bin/python does not exist. > I also tried launching pyspark specifying argument --file py27.zip#py27 > I get error > Exception in thread "main" java.lang.IllegalArgumentException: pyspark does > not support any application options. > Am I doing this the right way ? Is there something wrong in the way I am > doing this or is this a known issue ? Is PySpark working for > DistributedCache of zip files ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org