Just checked TOP at worker node during 20% job (4 000 000 users in my
case): java process uses 2800 MB (resident mem).
Good news for me, both U and M iteration passed on 20% sample.

Can I use current M (computed on 20% of users, 15 iterations) to process
the reminder (80% users)?
Fix M, recompute U2 (single U iteration, do not rewrite M)
Fix M, recompute U3 (single U iteration, do not rewrite M)

…
Fix M, recompute Un (single U iteration, do not rewrite M)


?

Pavel



19.11.12 12:50 пользователь "Sebastian Schelter" <s...@apache.org> написал:

>Thats huge. It means you need to fit a dense 20M x 20 matrix into the
>mappers RAM that recompute U. This will require a few gigabytes...
>
>If that doesn't work for you, you could try to rewrite the job to use
>reduce-side joins to recompute the factors, this would however be a much
>slower implementation.
>
>
>On 19.11.2012 09:36, Abramov Pavel wrote:
>> About 20 000 000 users and 150 000 items. 0,03% non-zeros. 20 features
>> required.
>> 
>> Pavel
>> 
>> 19.11.12 12:31 пользователь "Sebastian Schelter" <s...@apache.org>
>>написал:
>> 
>>> You need to give much more memory than 200 MB to your mappers. What are
>>> the dimensions of your input in terms of users and items?
>>>
>>> --sebastian
>>>
>>> On 19.11.2012 09:28, Abramov Pavel wrote:
>>>> Thanks for your replies.
>>>>
>>>> 1) 
>>>>> Can you describe your failure or give us a strack trace?
>>>>
>>>>
>>>> Here is job log:
>>>>
>>>> 12/11/19 09:54:07 INFO als.ParallelALSFactorizationJob: Recomputing U
>>>> (iteration 0/15)
>>>> …
>>>> 12/11/19 10:03:31 INFO mapred.JobClient: Job complete:
>>>> job_201211150152_1671
>>>> 12/11/19 10:03:31 INFO als.ParallelALSFactorizationJob: Recomputing M
>>>> (iteration 0/15)
>>>> …
>>>> 12/11/19 10:10:04 INFO mapred.JobClient: Task Id :
>>>> attempt_201211150152_<*ALL*>, Status : FAILED
>>>> …
>>>> 12/11/19 10:40:40 INFO mapred.JobClient:     Failed map tasks=1
>>>>
>>>>
>>>>
>>>> All of these mappers (Recomputing M on 1st iteration) fail with "Java
>>>> heap
>>>> space" error.
>>>>
>>>> Here is Hadoop job memory config:
>>>>
>>>> mapred.map.child.java.opts = -Xmx5024m -XX:-UseGCOverheadLimit
>>>> mapred.child.java.opts = -Xmx200m
>>>> mapred.job.reuse.jvm.num.tasks = -1
>>>>
>>>>
>>>> mapred.cluster.reduce.memory.mb = -1
>>>> mapred.cluster.map.memory.mb = -1
>>>> mapred.cluster.max.reduce.memory.mb = -1
>>>> mapred.job.reduce.memory.mb = -1
>>>> mapred.job.map.memory.mb = -1
>>>> mapred.cluster.max.map.memory.mb = -1
>>>>
>>>> Any tweaks possible? Is mapred.map.child.java.opts ok?
>>>>
>>>> 2) As far as I understand ALS can not load U matrix in RAM (20m users)
>>>> while M is Ok (150k items). Can I split input matrix R (keep all
>>>>items,
>>>> split by user) to R1, R2, Rn, then compute M and U1 on R1 (many
>>>> iterations, then fix M), then compute U2,U3,Un etc using existing M
>>>>(0,5
>>>> iteration, do not recompute M)? I want to do this to avoid Memory
>>>>issues
>>>> (train on part ).
>>>> My question is: will all the users from U1, U2, Un "exist" in the same
>>>> feature space? Can I then compare users from U1 with users from U2
>>>>using
>>>> their features?
>>>> Any tweak possible here
>>>>
>>>> 3) How to calculate maximum matrix size for given items count and
>>>>memory
>>>> limit? For example, my matrix has 20m users, I want to factorize it
>>>> using
>>>> 20 features. 20m*20*8 =
>>>> 3.2 Gb. On the one hand I want to avoid "Java heap space" on the
>>>>another
>>>> hand I want to provide my model with maximum training data. I
>>>>understand
>>>> that minor changes to parallelALS needed.
>>>>
>>>> Have a nice day!
>>>>
>>>>
>>>> Regards, 
>>>> Pavel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>> 
>

Reply via email to