The problem seems to be that unpicklable RDD objects are being pulled into
function closures. In your failing dockets, it looks like the rdd created
through sc.parallelize is being pulled into the map lambda’s function closure.
I opened a new Dill bug with a small test case that reproduces this
Hey,
On Mon, Jun 23, 2014 at 5:27 PM, Mark Baker wrote:
> Thanks for the context, Josh.
>
> I've gone ahead and created a new test case and just opened a new issue;
>
> https://github.com/uqfoundation/dill/issues/49
So that one's dealt with; it was a sys.prefix issue with me using a
virtualenv a
On Thu, Jun 19, 2014 at 3:56 PM, Josh Rosen wrote:
> Thanks for helping with the Dill integration; I had some early first
> attempts, but had to set them aside when I got busy with some other work.
>
> Just to bring everyone up to speed regarding context:
> There are some objects that PySpark’s `
Thanks for helping with the Dill integration; I had some early first attempts,
but had to set them aside when I got busy with some other work.
Just to bring everyone up to speed regarding context:
There are some objects that PySpark’s `cloudpickle` library doesn’t serialize
properly, such as ope
Hi. As part of my attempt to port Pyspark to Python 3, I've
re-applied, with modifications, Josh's old commit for using Dill with
Pyspark (as Dill already supports Python 3). Alas, I ran into an odd
problem that I could use some help with.
Josh's old commit;
https://github.com/JoshRosen/incubator