Please try export PYSPARK_PYTHON=<path_to_python_2.7> On Wed, Mar 16, 2016 at 3:00 PM, ram kumar <ramkumarro...@gmail.com> wrote:
> Hi, > > I get the following error when running a job as pyspark, > > {{{ > An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task > 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 0.0 (TID 6, ): java.io.IOException: Cannot run program "python2.7": > error=2, No such file or directory > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > at > org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:160) > at > org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86) > at > org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62) > at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135) > at > org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: error=2, No such file or directory > at java.lang.UNIXProcess.forkAndExec(Native Method) > at java.lang.UNIXProcess.<init>(UNIXProcess.java:248) > at java.lang.ProcessImpl.start(ProcessImpl.java:134) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > ... 36 more > > }}} > > python2.7 couldn't found. But i m using vertual env python 2.7 > {{{ > [ram@test-work workspace]$ python > Python 2.7.8 (default, Mar 15 2016, 04:37:00) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> > }}} > > Can anyone help me with this? > Thanks > -- Best Regards Jeff Zhang