I will up the ante with the time out and report back -- thanks all for the suggestions
Hey, Sebastian -- Here are the arguments I am using: --input matrix --output ALS --numFeatures 25 --numIterations 10 --lambda 0.065 When the mapper loads the matrix into memory it only loads the actual non-zero data, correct? Hey Ted -- I messed up on the sparsity. Turns out there are only 70M non-zero elements. Oh, and, I only have binary data -- I wasn't sure of the implications with ALS-WR on binary data -- I couldn't find anything to suggest otherwise. I am using data of the format user,item,1 I have read about probabilistic factorization -- which works with binary data -- and perhaps naively, thought ALS-WR was similar so what-the-heck :-) I'd love nothing more than to share the data, however, I'd probably get in some trouble :-) Perhaps I could generate a matrix with a similar distribution? -- I'll have to check on that and see if it is ok #bureaucracy Stay tuned... On Thu, Feb 2, 2012 at 1:47 AM, Sebastian Schelter <s...@apache.org> wrote: > Nicholas, > > can you give us the detailed arguments you start the job with? I'd > especially be interested in the number of features (--numFeatures) you > use. Do you use the job with implicit feedback data > (--implicitFeedback=true)? > > The memory requirements of the job are the following: > > In each iteration either the item-features matrix (items x features) or > the user-features matrix (users x features) is loaded into the memory of > each mapper. Then the original user-item matrix (or its transpose) is > read row-wise by the mappers and they recompute the features via > > AlternatingLeastSquaresSolver/ImplicitFeedbackAlternatingLeastSquaresSolver. > > --sebastian > > > On 02.02.2012 09:53, Sean Owen wrote: > > I have seen this happen in "normal" operation when the sorting on the > > mapper is taking a long long time, because the output is large. You can > > tell it to increase the timeout. If this is what is happening, you won't > > have a chance to update a counter as a keep-alive ping, but yes that is > > generally right otherwise. If this is the case it's that a mapper is > > outputting a whole lot of info, perhaps 'too much'. I don't know for > sure, > > just another a guess for the pile. > > > > On Thu, Feb 2, 2012 at 1:44 AM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > >> Status reporting happens automatically when output is generated. In a > long > >> computation, it is good form to occasionally update a counter or > otherwise > >> indicate that the computation is still progressing. > >> > >> On Wed, Feb 1, 2012 at 5:23 PM, Nicholas Kolegraff > >> <nickkolegr...@gmail.com>wrote: > >> > >>> Do you know if it should still report status in the midst of a complex > >>> task? Seems questionable that it wouldn't just send a friendly hello? > >>> > >>> > >> > > > >