Specifically the error I see when I try to operate on rdd created by sc.parallelize method : org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 12:12 was 12062263 bytes which exceeds spark.akka.frameSize (10485760 bytes). Consider using broadcast variables for large values.
On Sun, Sep 14, 2014 at 2:20 AM, Chengi Liu <chengi.liu...@gmail.com> wrote: > Hi, > I am trying to create an rdd out of large matrix.... sc.parallelize > suggest to use broadcast > But when I do > > sc.broadcast(data) > I get this error: > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/common/usg/spark/1.0.2/python/pyspark/context.py", line 370, > in broadcast > pickled = pickleSer.dumps(value) > File "/usr/common/usg/spark/1.0.2/python/pyspark/serializers.py", line > 279, in dumps > def dumps(self, obj): return cPickle.dumps(obj, 2) > SystemError: error return without exception set > Help? > >