Weird python errors like this generally mean you have different versions of python in the nodes of your cluster. Can you check that?
On Tue, Mar 3, 2015 at 4:21 PM, subscripti...@prismalytics.io <subscripti...@prismalytics.io> wrote: > Hi Friends: > > We noticed the following in 'pyspark' happens when running in distributed > Standalone Mode (MASTER=spark://vps00:7077), > but not in Local Mode (MASTER=local[n]). > > See the following, particularly what is highlighted in Red (again the > problem only happens in Standalone Mode). > Any ideas? Thank you in advance! =:) > >>>> >>>> rdd = sc.textFile('file:///etc/hosts') >>>> rdd.first() > > Traceback (most recent call last): > File "<input>", line 1, in <module> > File "/usr/lib/spark/python/pyspark/rdd.py", line 1129, in first > rs = self.take(1) > File "/usr/lib/spark/python/pyspark/rdd.py", line 1111, in take > res = self.context.runJob(self, takeUpToNumLeft, p, True) > File "/usr/lib/spark/python/pyspark/context.py", line 818, in runJob > it = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, > javaPartitions, allowLocal) > File > "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line > 538, in __call__ > self.target_id, self.name) > File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > format(target_id, '.', name), value) > Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 > (TID 7, vps03): org.apache.spark.api.python.PythonException: Traceback (most > recent call last): > File "/usr/lib/spark/python/pyspark/worker.py", line 107, in main > process() > File "/usr/lib/spark/python/pyspark/worker.py", line 98, in process > serializer.dump_stream(func(split_index, iterator), outfile) > File "/usr/lib/spark/python/pyspark/serializers.py", line 227, in > dump_stream > vs = list(itertools.islice(iterator, batch)) > File "/usr/lib/spark/python/pyspark/rdd.py", line 1106, in takeUpToNumLeft > <--- See around line 1106 of this file in the CDH5 Spark Distribution. > while taken < left: > ImportError: No module named iter > >>>> # But iter() exists as a built-in (not as a module) ... >>>> iter(range(10)) > <listiterator object at 0x423ff10> >>>> > > cluster$ rpm -qa | grep -i spark > [ ... ] > spark-python-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch > spark-core-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch > spark-worker-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch > spark-master-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch > > > Thank you! > Team Prismalytics -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org