We've done this with reduce - that definitely works.

I've reworked the logic to use accumulators because, when it works, it's
5-10x faster

On Fri, Nov 21, 2014 at 4:44 AM, Sean Owen <so...@cloudera.com> wrote:

> This sounds more like a use case for reduce? or fold? it sounds like
> you're kind of cobbling together the same function on accumulators,
> when reduce/fold are simpler and have the behavior you suggest.
>
> On Fri, Nov 21, 2014 at 5:46 AM, Nathan Kronenfeld
> <nkronenf...@oculusinfo.com> wrote:
> > I think I understand what is going on here, but I was hoping someone
> could
> > confirm (or explain reality if I don't) what I'm seeing.
> >
> > We are collecting data using a rather sizable accumulator - essentially,
> an
> > array of tens of thousands of entries.  All told, about 1.3m of data.
> >
> > If I understand things correctly, it looks to me like, when our job is
> done,
> > a copy of this array is retrieved from each individual task, all at once,
> > for combination on the client - which means, with 400 tasks to the job,
> each
> > collection is using up half a gig of memory on the client.
> >
> > Is this true?  If so, does anyone know a way to get accumulators to
> > accumulate as results collect, rather than all at once at the end, so we
> > only have to hold a few in memory at a time, rather than all 400?
> >
> > Thanks,
> >               -Nathan
> >
> >
> > --
> > Nathan Kronenfeld
> > Senior Visualization Developer
> > Oculus Info Inc
> > 2 Berkeley Street, Suite 600,
> > Toronto, Ontario M5A 4J5
> > Phone:  +1-416-203-3003 x 238
> > Email:  nkronenf...@oculusinfo.com
>



-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenf...@oculusinfo.com

Reply via email to