Specifically the error I see when I try to operate on rdd created by
sc.parallelize method
: org.apache.spark.SparkException: Job aborted due to stage failure:
Serialized task 12:12 was 12062263 bytes which exceeds spark.akka.frameSize
(10485760 bytes). Consider using broadcast variables for large values.

On Sun, Sep 14, 2014 at 2:20 AM, Chengi Liu <chengi.liu...@gmail.com> wrote:

> Hi,
>    I am trying to create an rdd out of large matrix.... sc.parallelize
> suggest to use broadcast
> But when I do
>
> sc.broadcast(data)
> I get this error:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/common/usg/spark/1.0.2/python/pyspark/context.py", line 370,
> in broadcast
>     pickled = pickleSer.dumps(value)
>   File "/usr/common/usg/spark/1.0.2/python/pyspark/serializers.py", line
> 279, in dumps
>     def dumps(self, obj): return cPickle.dumps(obj, 2)
> SystemError: error return without exception set
> Help?
>
>

Reply via email to