Hi Nathan,

It sounds like what you're asking for has already been filed as
https://issues.apache.org/jira/browse/SPARK-664  Does that ticket match
what you're proposing?

Andrew

On Fri, Nov 21, 2014 at 12:29 PM, Nathan Kronenfeld <
nkronenf...@oculusinfo.com> wrote:

> We've done this with reduce - that definitely works.
>
> I've reworked the logic to use accumulators because, when it works, it's
> 5-10x faster
>
> On Fri, Nov 21, 2014 at 4:44 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> This sounds more like a use case for reduce? or fold? it sounds like
>> you're kind of cobbling together the same function on accumulators,
>> when reduce/fold are simpler and have the behavior you suggest.
>>
>> On Fri, Nov 21, 2014 at 5:46 AM, Nathan Kronenfeld
>> <nkronenf...@oculusinfo.com> wrote:
>> > I think I understand what is going on here, but I was hoping someone
>> could
>> > confirm (or explain reality if I don't) what I'm seeing.
>> >
>> > We are collecting data using a rather sizable accumulator -
>> essentially, an
>> > array of tens of thousands of entries.  All told, about 1.3m of data.
>> >
>> > If I understand things correctly, it looks to me like, when our job is
>> done,
>> > a copy of this array is retrieved from each individual task, all at
>> once,
>> > for combination on the client - which means, with 400 tasks to the job,
>> each
>> > collection is using up half a gig of memory on the client.
>> >
>> > Is this true?  If so, does anyone know a way to get accumulators to
>> > accumulate as results collect, rather than all at once at the end, so we
>> > only have to hold a few in memory at a time, rather than all 400?
>> >
>> > Thanks,
>> >               -Nathan
>> >
>> >
>> > --
>> > Nathan Kronenfeld
>> > Senior Visualization Developer
>> > Oculus Info Inc
>> > 2 Berkeley Street, Suite 600,
>> > Toronto, Ontario M5A 4J5
>> > Phone:  +1-416-203-3003 x 238
>> > Email:  nkronenf...@oculusinfo.com
>>
>
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  nkronenf...@oculusinfo.com
>

Reply via email to