Hi Tobias, One hack you can try is:
rdd.mapPartitions(iter => { val x = new X() iter.map(row => x.doSomethingWith(row)) ++ { x.shutdown(); Iterator.empty } }) Best, Xiangrui On Thu, May 29, 2014 at 11:38 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Hi, > > I want to use an object x in my RDD processing as follows: > > val x = new X() > rdd.map(row => x.doSomethingWith(row)) > println(rdd.count()) > x.shutdown() > > Now the problem is that X is non-serializable, so while this works > locally, it does not work in cluster setup. I thought I could do > > rdd.mapPartitions(iter => { > val x = new X() > val result = iter.map(row => x.doSomethingWith(row)) > x.shutdown() > result > }) > > to create an instance of X locally, but obviously x.shutdown() is > called before the first row is processed. > > How can I specify these node-local setup/teardown functions or how do > I deal in general with non-serializable classes? > > Thanks > Tobias