That too, even better. Isn't that already done? Could be in one place but
not another. IIRC there were also cases where it was a lot easier to pass
around an object internally and mutability solved the performance issue,
without much risk since it was only internal. You can (nay, must) always
copy the objects before being returned.



On Wed, Mar 6, 2013 at 4:01 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> I would recommend against a mutable object on maintenance grounds.
>
> Better is to keep the threshold that a new score must meet and only
> construct the object on need.  That cuts the allocation down to negligible
> levels.
>
> On Wed, Mar 6, 2013 at 6:11 AM, Sean Owen <sro...@gmail.com> wrote:
>
> > OK, that's reasonable on 35 machines. (You can turn up to 70 reducers,
> > probably, as most machines can handle 2 reducers at once).
> > I think the recommendation step loads one whole matrix into memory.
> You're
> > not running out of memory but if you're turning up the heap size to
> > accommodate, you might be hitting swapping, yes. I think (?) the
> > conventional wisdom is to turn off swap for Hadoop.
> >
> > Sebastian yes that is probably a good optimization; I've had good results
> > reusing a mutable object in this context.
> >
> >
> > On Wed, Mar 6, 2013 at 10:54 AM, Josh Devins <h...@joshdevins.com> wrote:
> >
> > > The factorization at 2-hours is kind of a non-issue (certainly fast
> > > enough). It was run with (if I recall correctly) 30 reducers across a
> 35
> > > node cluster, with 10 iterations.
> > >
> > > I was a bit shocked at how long the recommendation step took and will
> > throw
> > > some timing debug in to see where the problem lies exactly. There were
> no
> > > other jobs running on the cluster during these attempts, but it's
> > certainly
> > > possible that something is swapping or the like. I'll be looking more
> > > closely today before I start to consider other options for calculating
> > the
> > > recommendations.
> > >
> > >
> >
>

Reply via email to