Re: Problems with Pyspark + Dill tests

2014-06-25 Thread Josh Rosen
The problem seems to be that unpicklable RDD objects are being pulled into function closures.  In your failing dockets, it looks like the rdd created through sc.parallelize is being pulled into the map lambda’s function closure. I opened a new Dill bug with a small test case that reproduces this

Re: Problems with Pyspark + Dill tests

2014-06-25 Thread Mark Baker
Hey, On Mon, Jun 23, 2014 at 5:27 PM, Mark Baker wrote: > Thanks for the context, Josh. > > I've gone ahead and created a new test case and just opened a new issue; > > https://github.com/uqfoundation/dill/issues/49 So that one's dealt with; it was a sys.prefix issue with me using a virtualenv a

Re: Problems with Pyspark + Dill tests

2014-06-23 Thread Mark Baker
On Thu, Jun 19, 2014 at 3:56 PM, Josh Rosen wrote: > Thanks for helping with the Dill integration; I had some early first > attempts, but had to set them aside when I got busy with some other work. > > Just to bring everyone up to speed regarding context: > There are some objects that PySpark’s `

Re: Problems with Pyspark + Dill tests

2014-06-19 Thread Josh Rosen
Thanks for helping with the Dill integration; I had some early first attempts, but had to set them aside when I got busy with some other work. Just to bring everyone up to speed regarding context: There are some objects that PySpark’s `cloudpickle` library doesn’t serialize properly, such as ope

Problems with Pyspark + Dill tests

2014-06-19 Thread Mark Baker
Hi. As part of my attempt to port Pyspark to Python 3, I've re-applied, with modifications, Josh's old commit for using Dill with Pyspark (as Dill already supports Python 3). Alas, I ran into an odd problem that I could use some help with. Josh's old commit; https://github.com/JoshRosen/incubator