Weird python errors like this generally mean you have different
versions of python in the nodes of your cluster. Can you check that?

On Tue, Mar 3, 2015 at 4:21 PM, subscripti...@prismalytics.io
<subscripti...@prismalytics.io> wrote:
> Hi Friends:
>
> We noticed the following in 'pyspark' happens when running in distributed
> Standalone Mode (MASTER=spark://vps00:7077),
> but not in Local Mode (MASTER=local[n]).
>
> See the following, particularly what is highlighted in Red (again the
> problem only happens in Standalone Mode).
> Any ideas? Thank you in advance! =:)
>
>>>>
>>>> rdd = sc.textFile('file:///etc/hosts')
>>>> rdd.first()
>
> Traceback (most recent call last):
>   File "<input>", line 1, in <module>
>   File "/usr/lib/spark/python/pyspark/rdd.py", line 1129, in first
>     rs = self.take(1)
>   File "/usr/lib/spark/python/pyspark/rdd.py", line 1111, in take
>     res = self.context.runJob(self, takeUpToNumLeft, p, True)
>   File "/usr/lib/spark/python/pyspark/context.py", line 818, in runJob
>     it = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd,
> javaPartitions, allowLocal)
>   File
> "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line
> 538, in __call__
>     self.target_id, self.name)
>   File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
>     format(target_id, '.', name), value)
> Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0
> (TID 7, vps03): org.apache.spark.api.python.PythonException: Traceback (most
> recent call last):
>   File "/usr/lib/spark/python/pyspark/worker.py", line 107, in main
>     process()
>   File "/usr/lib/spark/python/pyspark/worker.py", line 98, in process
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/usr/lib/spark/python/pyspark/serializers.py", line 227, in
> dump_stream
>     vs = list(itertools.islice(iterator, batch))
>   File "/usr/lib/spark/python/pyspark/rdd.py", line 1106, in takeUpToNumLeft
> <--- See around line 1106 of this file in the CDH5 Spark Distribution.
>     while taken < left:
> ImportError: No module named iter
>
>>>> # But iter() exists as a built-in (not as a module) ...
>>>> iter(range(10))
> <listiterator object at 0x423ff10>
>>>>
>
> cluster$ rpm -qa | grep -i spark
> [ ... ]
> spark-python-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch
> spark-core-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch
> spark-worker-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch
> spark-master-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch
>
>
> Thank you!
> Team Prismalytics



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to