[ https://issues.apache.org/jira/browse/SPARK-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-19097. ------------------------------- Resolution: Duplicate I don't see value in opening a bunch of JIRAs on the same theme. These look like near duplicates and depend on this functionality being supported in the first place, which isn't apparently supported according to the parent. > virtualenv example failed with conda due to ImportError: No module named > ruamel.yaml.comments > --------------------------------------------------------------------------------------------- > > Key: SPARK-19097 > URL: https://issues.apache.org/jira/browse/SPARK-19097 > Project: Spark > Issue Type: Sub-task > Components: PySpark > Reporter: Yesha Vora > > Spark version : 2 > Steps: > * install conda on all nodes (python2.7) ( pip install conda ) > * create requirement1.txt with "numpy > requirement1.txt " > * Run kmeans.py application in yarn-client mode. > {code} > spark-submit --master yarn --deploy-mode client --conf > "spark.pyspark.virtualenv.enabled=true" --conf > "spark.pyspark.virtualenv.type=conda" --conf > "spark.pyspark.virtualenv.requirements=/tmp/requirements1.txt" --conf > "spark.pyspark.virtualenv.bin.path=/usr/bin/conda" --jars > /usr/hadoop-client/lib/hadoop-lzo.jar kmeans.py /tmp/in/kmeans_data.txt > 3{code} > {code:title=app log} > 17/01/06 01:39:25 DEBUG PythonWorkerFactory: user.home=/home/yarn > 17/01/06 01:39:25 DEBUG PythonWorkerFactory: Running command:/usr/bin/conda > create --prefix > /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1483592608863_0017/container_1483592608863_0017_01_000003/virtualenv_application_1483592608863_0017_0 > --file requirements1.txt -y > Traceback (most recent call last): > File "/usr/bin/conda", line 11, in <module> > load_entry_point('conda==4.2.7', 'console_scripts', 'conda')() > File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line > 561, in load_entry_point > return get_distribution(dist).load_entry_point(group, name) > File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line > 2631, in load_entry_point > return ep.load() > File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line > 2291, in load > return self.resolve() > File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line > 2297, in resolve > module = __import__(self.module_name, fromlist=['__name__'], level=0) > File "/usr/lib/python2.7/site-packages/conda/cli/__init__.py", line 8, in > <module> > from .main import main # NOQA > File "/usr/lib/python2.7/site-packages/conda/cli/main.py", line 46, in > <module> > from ..base.context import context > File "/usr/lib/python2.7/site-packages/conda/base/context.py", line 18, in > <module> > from ..common.configuration import (Configuration, MapParameter, > PrimitiveParameter, > File "/usr/lib/python2.7/site-packages/conda/common/configuration.py", line > 40, in <module> > from ruamel.yaml.comments import CommentedSeq, CommentedMap # pragma: no > cover > ImportError: No module named ruamel.yaml.comments > 17/01/06 01:39:26 WARN BlockManager: Putting block rdd_3_0 failed due to an > exception > 17/01/06 01:39:26 WARN BlockManager: Block rdd_3_0 could not be removed as it > was not found on disk or in memory > 17/01/06 01:39:26 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.RuntimeException: Fail to run command: /usr/bin/conda create > --prefix > /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1483592608863_0017/container_1483592608863_0017_01_000003/virtualenv_application_1483592608863_0017_0 > --file requirements1.txt -y > at > org.apache.spark.api.python.PythonWorkerFactory.execCommand(PythonWorkerFactory.scala:142) > at > org.apache.spark.api.python.PythonWorkerFactory.setupVirtualEnv(PythonWorkerFactory.scala:124) > at > org.apache.spark.api.python.PythonWorkerFactory.<init>(PythonWorkerFactory.scala:70) > at > org.apache.spark.SparkEnv$$anonfun$createPythonWorker$1.apply(SparkEnv.scala:117) > at > org.apache.spark.SparkEnv$$anonfun$createPythonWorker$1.apply(SparkEnv.scala:117) > at > scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:194) > at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:80) > at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116) > at > org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336) > at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:285) > at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287){code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org