Re: pyspark serializer can't handle functions?

Matei Zaharia Mon, 16 Jun 2014 11:04:57 -0700

It’s true that it can’t. You can try to use the CloudPickle library instead, 
which is what we use within PySpark to serialize functions (see 
python/pyspark/cloudpickle.py). However I’m also curious, why do you need an 
RDD of functions?


Matei

On Jun 15, 2014, at 4:49 PM, madeleine <madeleine.ud...@gmail.com> wrote:

> It seems that the default serializer used by pyspark can't serialize a list
> of functions.
> I've seen some posts about trying to fix this by using dill to serialize
> rather than pickle. 
> Does anyone know what the status of that project is, or whether there's
> another easy workaround?
> 
> I've pasted a sample error message below. Here, regs is a function defined
> in another file myfile.py that has been included on all workers via the
> pyFiles argument to SparkContext: sc = SparkContext("local",
> "myapp",pyFiles=["myfile.py"]).
> 
>  File "runfile.py", line 45, in __init__
>    regsRDD = sc.parallelize([regs]*self.n)
>  File "/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/context.py",
> line 223, in parallelize
>    serializer.dump_stream(c, tempFile)
>  File
> "/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/serializers.py", line
> 182, in dump_stream
>    self.serializer.dump_stream(self._batched(iterator), stream)
>  File
> "/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/serializers.py", line
> 118, in dump_stream
>    self._write_with_length(obj, stream)
>  File
> "/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/serializers.py", line
> 128, in _write_with_length
>    serialized = self.dumps(obj)
>  File
> "/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/serializers.py", line
> 270, in dumps
>    def dumps(self, obj): return cPickle.dumps(obj, 2)
> cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup
> __builtin__.function failed
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-serializer-can-t-handle-functions-tp7650.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: pyspark serializer can't handle functions?

Reply via email to