Re: Parallel ALS-WR on very large matrix -- crashing (I think)

Sebastian Schelter Thu, 02 Feb 2012 08:40:29 -0800

Hmm, are you sure that the mappers have enough memory? You can set that
via Dmapred.child.java.opts=-Xmx[some number]m


--sebastian

On 02.02.2012 17:37, Nicholas Kolegraff wrote:
> Sounds good. Thanks Sebastian
> 
> The interesting thing is -- I tried to sample the matrix down one time to
> about 10% of non-zeros -- and worked no problem.
> 
> On Thu, Feb 2, 2012 at 8:31 AM, Sebastian Schelter <s...@apache.org> wrote:
> 
>> Your parameters look good, except if you have binary data, you should
>> set --implicitFeedback=true. You could also set numFeatures to a very
>> small value (like 5) just to see if that helps.
>>
>> The mappers load one of the feature matrices into memory which are dense
>> (#items x #features entries or #users x #features entries). Are you sure
>> that the mappers have enough memory for that?
>>
>> It's really strange that you have problems with such small data, I
>> tested this with Netflix (> 100M non-zeros) on a few machines and it
>> worked quite well.
>>
>> --sebastian
>>
>>
>>
>> On 02.02.2012 17:25, Nicholas Kolegraff wrote:
>>> I will up the ante with the time out and report back -- thanks all for
>> the
>>> suggestions
>>>
>>> Hey, Sebastian -- Here are the arguments I am using:
>>> --input matrix --output ALS --numFeatures 25 --numIterations 10 --lambda
>>> 0.065
>>> When the mapper loads the matrix into memory it only loads the actual
>>> non-zero data, correct?
>>>
>>> Hey Ted -- I messed up on the sparsity.  Turns out there are only 70M
>>> non-zero elements.
>>>
>>> Oh, and, I only have binary data -- I wasn't sure of the implications
>> with
>>> ALS-WR on binary data -- I couldn't find anything to suggest otherwise.
>>> I am using data of the format user,item,1
>>> I have read about probabilistic factorization -- which works with binary
>>> data -- and perhaps naively, thought ALS-WR was similar so what-the-heck
>> :-)
>>>
>>> I'd love nothing more than to share the data, however, I'd probably get
>> in
>>> some trouble :-)
>>> Perhaps I could generate a matrix with a similar distribution? -- I'll
>> have
>>> to check on that and see if it is ok #bureaucracy
>>>
>>> Stay tuned...
>>>
>>> On Thu, Feb 2, 2012 at 1:47 AM, Sebastian Schelter <s...@apache.org>
>> wrote:
>>>
>>>> Nicholas,
>>>>
>>>> can you give us the detailed arguments you start the job with? I'd
>>>> especially be interested in the number of features (--numFeatures) you
>>>> use. Do you use the job with implicit feedback data
>>>> (--implicitFeedback=true)?
>>>>
>>>> The memory requirements of the job are the following:
>>>>
>>>> In each iteration either the item-features matrix (items x features) or
>>>> the user-features matrix (users x features) is loaded into the memory of
>>>> each mapper. Then the original user-item matrix (or its transpose) is
>>>> read row-wise by the mappers and they recompute the features via
>>>>
>>>>
>> AlternatingLeastSquaresSolver/ImplicitFeedbackAlternatingLeastSquaresSolver.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>> On 02.02.2012 09:53, Sean Owen wrote:
>>>>> I have seen this happen in "normal" operation when the sorting on the
>>>>> mapper is taking a long long time, because the output is large. You can
>>>>> tell it to increase the timeout.  If this is what is happening, you
>> won't
>>>>> have a chance to update a counter as a keep-alive ping, but yes that is
>>>>> generally right otherwise. If this is the case it's that a mapper is
>>>>> outputting a whole lot of info, perhaps 'too much'. I don't know for
>>>> sure,
>>>>> just another a guess for the pile.
>>>>>
>>>>> On Thu, Feb 2, 2012 at 1:44 AM, Ted Dunning <ted.dunn...@gmail.com>
>>>> wrote:
>>>>>
>>>>>> Status reporting happens automatically when output is generated.  In a
>>>> long
>>>>>> computation, it is good form to occasionally update a counter or
>>>> otherwise
>>>>>> indicate that the computation is still progressing.
>>>>>>
>>>>>> On Wed, Feb 1, 2012 at 5:23 PM, Nicholas Kolegraff
>>>>>> <nickkolegr...@gmail.com>wrote:
>>>>>>
>>>>>>> Do you know if it should still report status in the midst of a
>> complex
>>>>>>> task?  Seems questionable that it wouldn't just send a friendly
>> hello?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Parallel ALS-WR on very large matrix -- crashing (I think)

Reply via email to