Hi Nathan, It sounds like what you're asking for has already been filed as https://issues.apache.org/jira/browse/SPARK-664 Does that ticket match what you're proposing?
Andrew On Fri, Nov 21, 2014 at 12:29 PM, Nathan Kronenfeld < nkronenf...@oculusinfo.com> wrote: > We've done this with reduce - that definitely works. > > I've reworked the logic to use accumulators because, when it works, it's > 5-10x faster > > On Fri, Nov 21, 2014 at 4:44 AM, Sean Owen <so...@cloudera.com> wrote: > >> This sounds more like a use case for reduce? or fold? it sounds like >> you're kind of cobbling together the same function on accumulators, >> when reduce/fold are simpler and have the behavior you suggest. >> >> On Fri, Nov 21, 2014 at 5:46 AM, Nathan Kronenfeld >> <nkronenf...@oculusinfo.com> wrote: >> > I think I understand what is going on here, but I was hoping someone >> could >> > confirm (or explain reality if I don't) what I'm seeing. >> > >> > We are collecting data using a rather sizable accumulator - >> essentially, an >> > array of tens of thousands of entries. All told, about 1.3m of data. >> > >> > If I understand things correctly, it looks to me like, when our job is >> done, >> > a copy of this array is retrieved from each individual task, all at >> once, >> > for combination on the client - which means, with 400 tasks to the job, >> each >> > collection is using up half a gig of memory on the client. >> > >> > Is this true? If so, does anyone know a way to get accumulators to >> > accumulate as results collect, rather than all at once at the end, so we >> > only have to hold a few in memory at a time, rather than all 400? >> > >> > Thanks, >> > -Nathan >> > >> > >> > -- >> > Nathan Kronenfeld >> > Senior Visualization Developer >> > Oculus Info Inc >> > 2 Berkeley Street, Suite 600, >> > Toronto, Ontario M5A 4J5 >> > Phone: +1-416-203-3003 x 238 >> > Email: nkronenf...@oculusinfo.com >> > > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: nkronenf...@oculusinfo.com >