Just checked TOP at worker node during 20% job (4 000 000 users in my case): java process uses 2800 MB (resident mem). Good news for me, both U and M iteration passed on 20% sample.
Can I use current M (computed on 20% of users, 15 iterations) to process the reminder (80% users)? Fix M, recompute U2 (single U iteration, do not rewrite M) Fix M, recompute U3 (single U iteration, do not rewrite M) … Fix M, recompute Un (single U iteration, do not rewrite M) ? Pavel 19.11.12 12:50 пользователь "Sebastian Schelter" <s...@apache.org> написал: >Thats huge. It means you need to fit a dense 20M x 20 matrix into the >mappers RAM that recompute U. This will require a few gigabytes... > >If that doesn't work for you, you could try to rewrite the job to use >reduce-side joins to recompute the factors, this would however be a much >slower implementation. > > >On 19.11.2012 09:36, Abramov Pavel wrote: >> About 20 000 000 users and 150 000 items. 0,03% non-zeros. 20 features >> required. >> >> Pavel >> >> 19.11.12 12:31 пользователь "Sebastian Schelter" <s...@apache.org> >>написал: >> >>> You need to give much more memory than 200 MB to your mappers. What are >>> the dimensions of your input in terms of users and items? >>> >>> --sebastian >>> >>> On 19.11.2012 09:28, Abramov Pavel wrote: >>>> Thanks for your replies. >>>> >>>> 1) >>>>> Can you describe your failure or give us a strack trace? >>>> >>>> >>>> Here is job log: >>>> >>>> 12/11/19 09:54:07 INFO als.ParallelALSFactorizationJob: Recomputing U >>>> (iteration 0/15) >>>> … >>>> 12/11/19 10:03:31 INFO mapred.JobClient: Job complete: >>>> job_201211150152_1671 >>>> 12/11/19 10:03:31 INFO als.ParallelALSFactorizationJob: Recomputing M >>>> (iteration 0/15) >>>> … >>>> 12/11/19 10:10:04 INFO mapred.JobClient: Task Id : >>>> attempt_201211150152_<*ALL*>, Status : FAILED >>>> … >>>> 12/11/19 10:40:40 INFO mapred.JobClient: Failed map tasks=1 >>>> >>>> >>>> >>>> All of these mappers (Recomputing M on 1st iteration) fail with "Java >>>> heap >>>> space" error. >>>> >>>> Here is Hadoop job memory config: >>>> >>>> mapred.map.child.java.opts = -Xmx5024m -XX:-UseGCOverheadLimit >>>> mapred.child.java.opts = -Xmx200m >>>> mapred.job.reuse.jvm.num.tasks = -1 >>>> >>>> >>>> mapred.cluster.reduce.memory.mb = -1 >>>> mapred.cluster.map.memory.mb = -1 >>>> mapred.cluster.max.reduce.memory.mb = -1 >>>> mapred.job.reduce.memory.mb = -1 >>>> mapred.job.map.memory.mb = -1 >>>> mapred.cluster.max.map.memory.mb = -1 >>>> >>>> Any tweaks possible? Is mapred.map.child.java.opts ok? >>>> >>>> 2) As far as I understand ALS can not load U matrix in RAM (20m users) >>>> while M is Ok (150k items). Can I split input matrix R (keep all >>>>items, >>>> split by user) to R1, R2, Rn, then compute M and U1 on R1 (many >>>> iterations, then fix M), then compute U2,U3,Un etc using existing M >>>>(0,5 >>>> iteration, do not recompute M)? I want to do this to avoid Memory >>>>issues >>>> (train on part ). >>>> My question is: will all the users from U1, U2, Un "exist" in the same >>>> feature space? Can I then compare users from U1 with users from U2 >>>>using >>>> their features? >>>> Any tweak possible here >>>> >>>> 3) How to calculate maximum matrix size for given items count and >>>>memory >>>> limit? For example, my matrix has 20m users, I want to factorize it >>>> using >>>> 20 features. 20m*20*8 = >>>> 3.2 Gb. On the one hand I want to avoid "Java heap space" on the >>>>another >>>> hand I want to provide my model with maximum training data. I >>>>understand >>>> that minor changes to parallelALS needed. >>>> >>>> Have a nice day! >>>> >>>> >>>> Regards, >>>> Pavel >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >