I will up the ante with the time out and report back -- thanks all for the
suggestions

Hey, Sebastian -- Here are the arguments I am using:
--input matrix --output ALS --numFeatures 25 --numIterations 10 --lambda
0.065
When the mapper loads the matrix into memory it only loads the actual
non-zero data, correct?

Hey Ted -- I messed up on the sparsity.  Turns out there are only 70M
non-zero elements.

Oh, and, I only have binary data -- I wasn't sure of the implications with
ALS-WR on binary data -- I couldn't find anything to suggest otherwise.
I am using data of the format user,item,1
I have read about probabilistic factorization -- which works with binary
data -- and perhaps naively, thought ALS-WR was similar so what-the-heck :-)

I'd love nothing more than to share the data, however, I'd probably get in
some trouble :-)
Perhaps I could generate a matrix with a similar distribution? -- I'll have
to check on that and see if it is ok #bureaucracy

Stay tuned...

On Thu, Feb 2, 2012 at 1:47 AM, Sebastian Schelter <s...@apache.org> wrote:

> Nicholas,
>
> can you give us the detailed arguments you start the job with? I'd
> especially be interested in the number of features (--numFeatures) you
> use. Do you use the job with implicit feedback data
> (--implicitFeedback=true)?
>
> The memory requirements of the job are the following:
>
> In each iteration either the item-features matrix (items x features) or
> the user-features matrix (users x features) is loaded into the memory of
> each mapper. Then the original user-item matrix (or its transpose) is
> read row-wise by the mappers and they recompute the features via
>
> AlternatingLeastSquaresSolver/ImplicitFeedbackAlternatingLeastSquaresSolver.
>
> --sebastian
>
>
> On 02.02.2012 09:53, Sean Owen wrote:
> > I have seen this happen in "normal" operation when the sorting on the
> > mapper is taking a long long time, because the output is large. You can
> > tell it to increase the timeout.  If this is what is happening, you won't
> > have a chance to update a counter as a keep-alive ping, but yes that is
> > generally right otherwise. If this is the case it's that a mapper is
> > outputting a whole lot of info, perhaps 'too much'. I don't know for
> sure,
> > just another a guess for the pile.
> >
> > On Thu, Feb 2, 2012 at 1:44 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> >
> >> Status reporting happens automatically when output is generated.  In a
> long
> >> computation, it is good form to occasionally update a counter or
> otherwise
> >> indicate that the computation is still progressing.
> >>
> >> On Wed, Feb 1, 2012 at 5:23 PM, Nicholas Kolegraff
> >> <nickkolegr...@gmail.com>wrote:
> >>
> >>> Do you know if it should still report status in the midst of a complex
> >>> task?  Seems questionable that it wouldn't just send a friendly hello?
> >>>
> >>>
> >>
> >
>
>

Reply via email to