Thanks Davies and Eric. I followed Davies' instructions and it works
wonderful.

I would add that you can also add these scripts in the pyspark shell too:

pyspark --py-files support.py

where support.py is your script containing your class as Davies described.




Best,

Guillaume Guy

* +1 919 - 972 - 8750*

On Wed, Feb 18, 2015 at 11:48 PM, Davies Liu <dav...@databricks.com> wrote:

> Currently, PySpark can not support pickle a class object in current
> script ( '__main__'), the workaround could be put the implementation
> of the class into a separate module, then use "bin/spark-submit
> --py-files xxx.py" in deploy it.
>
> in xxx.py:
>
> class test(object):
>   def __init__(self, a, b):
>     self.total = a + b
>
> in job.py:
>
> from xxx import test
> a = sc.parallelize([(True,False),(False,False)])
> a.map(lambda (x,y): test(x,y))
>
> run it by:
>
> bin/spark-submit --py-files xxx.py job.py
>
>
> On Wed, Feb 18, 2015 at 1:48 PM, Guillaume Guy
> <guillaume.c....@gmail.com> wrote:
> > Hi,
> >
> > This is a duplicate of the stack-overflow question here. I hope to
> generate
> > more interest  on this mailing list.
> >
> >
> > The problem:
> >
> > I am running into some attribute lookup problems when trying to initiate
> a
> > class within my RDD.
> >
> > My workflow is quite standard:
> >
> > 1- Start with an RDD
> >
> > 2- Take each element of the RDD, initiate an object for each
> >
> > 3- Reduce (I will write a method that will define the reduce operation
> later
> > on)
> >
> > Here is #2:
> >
> > class test(object):
> > def __init__(self, a,b):
> >     self.total = a + b
> >
> > a = sc.parallelize([(True,False),(False,False)])
> > a.map(lambda (x,y): test(x,y))
> >
> > Here is the error I get:
> >
> > PicklingError: Can't pickle < class 'main.test' >: attribute lookup
> > main.test failed
> >
> > I'd like to know if there is any way around it. Please, answer with a
> > working example to achieve the intended results (i.e. creating a RDD of
> > objects of class "tests").
> >
> > Thanks in advance!
> >
> > Related question:
> >
> > https://groups.google.com/forum/#!topic/edx-code/9xzRJFyQwn
> >
> >
> > GG
> >
>

Reply via email to