To make things worse, the script bin/start-daemon.sh emits two heapsize specs instead of one: java blah blah blah -Xmx500m -Xmx1000m blah blah blah where both numbers come from "somewhere". Shell environment variables are a lousy data store.
On Fri, Apr 8, 2011 at 6:57 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > I don't think it is going to remedy his condition. He is having oom in the > driver and hadoop.env controls heap for tasktracker and such (not even child > task memory). He needs more memory in the frontend which is indeed the > bottleneck for that right now. > > apologies for brevity. > > Sent from my android. > -Dmitriy > On Apr 8, 2011 4:06 AM, "Danny Bickson" <danny.bick...@gmail.com> wrote: >> Now try to increase heap size in the file conf/hadoop-env.sh >> For example >> >> HADOOP_HEAPSIZE=4000 >> >> - Danny Bickson >> >> On Thu, Apr 7, 2011 at 10:32 PM, Wei Li <wei.le...@gmail.com> wrote: >> >>> >>> Hi Danny and All: >>> >>> I have increased the JVM memory, the mapred.child.java.opts, but still >>> failed after 2 or 3 passes through the corpus. >>> >>> And the matrix dimension is about 600,000 * 600,000, error info is as >>> follows: >>> >>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >>> at >>> > org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:434) >>> at >>> > org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387) >>> at >>> > org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134) >>> at >>> > org.apache.mahout.math.RandomAccessSparseVector.assign(RandomAccessSparseVector.java:106) >>> at >>> > org.apache.mahout.math.SparseRowMatrix.assignRow(SparseRowMatrix.java:148) >>> at >>> > org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:134) >>> at >>> > org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:177) >>> at >>> > org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:110) >>> at >>> > org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver$DistributedLanczosSolverJob.run(DistributedLanczosSolver.java:253) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >>> at >>> > org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.main(DistributedLanczosSolver.java:259) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >>> >>> >>> Best >>> Wei >>> >>> On Thu, Apr 7, 2011 at 7:59 AM, Wei Li <wei.le...@gmail.com> wrote: >>> >>>> Hi All: >>>> >>>> sorry for misunderstanding, the dimension is about 600,000 * 600,000 :) >>>> >>>> Best >>>> Wei >>>> >>>> >>>> On Wed, Apr 6, 2011 at 6:53 PM, Danny Bickson <danny.bick...@gmail.com >>wrote: >>>> >>>>> Hi. >>>>> Do you mean 60 million by 60 million? I guess this may be potentially >>>>> rather big for Mahout. >>>>> Another option you have is to try GraphLab: see >>>>> http://bickson.blogspot.com/2011/04/yahoo-kdd-cup-using-graphlab.html >>>>> I will be happy to give you support in case you would like to try >>>>> GraphLab. >>>>> >>>>> Best, >>>>> >>>>> DB >>>>> >>>>> >>>>> On Wed, Apr 6, 2011 at 2:13 AM, Wei Li <wei.le...@gmail.com> wrote: >>>>> >>>>>> Hi Danny: >>>>>> >>>>>> I have transformed the csv data into the DistributedRowMatrix >>>>>> format, but it still failed due to the memory problem after 2 or 3 >>>>>> iterations. >>>>>> >>>>>> my matrix dimension is about 60w * 60w, it is possible to do the >>>>>> svd decomposition for this scale using Mahout? >>>>>> >>>>>> Best >>>>>> Wei >>>>>> >>>>>> >>>>>> On Sat, Mar 26, 2011 at 1:43 AM, Danny Bickson < > danny.bick...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> Hi Wei, >>>>>>> You must verify you use SPARSE matrix and not dense, or else you will >>>>>>> surely get out of memory. >>>>>>> Take a look at this example: >>>>>>> > http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html >>>>>>> On how to prepare the input. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Danny Bickson >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 25, 2011 at 1:33 PM, Dmitriy Lyubimov <dlie...@gmail.com >>wrote: >>>>>>> >>>>>>>> Wei, >>>>>>>> >>>>>>>> 1) i think DenseMatrix is a RAM-only representation. Naturally, you >>>>>>>> get OOM because it all has to fit in memory. If you want to run >>>>>>>> RAM-only SVD computation, you perhaps don't need Mahout. If you want >>>>>>>> to run distributed SVD computations, you need to prepare your data > in >>>>>>>> what is called DistributedRowMatrix format. This is a sequence file >>>>>>>> with keys being whatever key you need to identify your rows, and >>>>>>>> values being VectorWritable wrapping either of vector > implementations >>>>>>>> found in mahout (Dense, sparse sequenctial, sparse random). >>>>>>>> 2) Once you've prepared your data in DRM format, you can run either > of >>>>>>>> SVD algorithms found in Mahout. It can be Lanczos solver ('mahout > svd >>>>>>>> ... ") or, on the trunk you can also find a stochastic svd method >>>>>>>> ('mahout ssvd ...") which is issue MAHOUT-593 i mentioned earlier. >>>>>>>> >>>>>>>> Either way, I am not sure why you want DenseMatrix unless you want > to >>>>>>>> use RAM-only Colt SVD solver -- but you certainly don't have to > focus >>>>>>>> on Mahout implementation of one if you just want a RAM solver. >>>>>>>> >>>>>>>> -d >>>>>>>> >>>>>>>> On Fri, Mar 25, 2011 at 3:25 AM, Wei Li <wei.le...@gmail.com> wrote: >>>>>>>> > >>>>>>>> > Actually, I would like to perform the spectral clustering on a > large >>>>>>>> scale >>>>>>>> > sparse matrix, but it failed due to the OutOfMemory error when >>>>>>>> creating the >>>>>>>> > DenseMatrix for SVD decomposition. >>>>>>>> > >>>>>>>> > Best >>>>>>>> > Wei >>>>>>>> > >>>>>>>> > On Fri, Mar 25, 2011 at 4:05 PM, Dmitriy Lyubimov < >>>>>>>> dlie...@gmail.com> wrote: >>>>>>>> >> >>>>>>>> >> SSVD != Lanczos. if you do PCA or LSI it is perhaps what you > need. >>>>>>>> it >>>>>>>> >> can take on these things. Well at least some of my branches can, > if >>>>>>>> >> not the official patch. >>>>>>>> >> >>>>>>>> >> -d >>>>>>>> >> >>>>>>>> >> On Thu, Mar 24, 2011 at 11:09 PM, Wei Li <wei.le...@gmail.com> >>>>>>>> wrote: >>>>>>>> >> > >>>>>>>> >> > thanks for your reply >>>>>>>> >> > >>>>>>>> >> > my matrix is not very dense, a sparse matrix. >>>>>>>> >> > >>>>>>>> >> > I have tried the svd of Mahout, but failed due to the > OutOfMemory >>>>>>>> error. >>>>>>>> >> > >>>>>>>> >> > Best >>>>>>>> >> > Wei >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > On Fri, Mar 25, 2011 at 2:03 PM, Dmitriy Lyubimov < >>>>>>>> dlie...@gmail.com> >>>>>>>> >> > wrote: >>>>>>>> >> >> >>>>>>>> >> >> you can certainly try to write it out into a DRM (distributed >>>>>>>> row >>>>>>>> >> >> matrix) and run stochastic SVD on hadoop (off the trunk now). >>>>>>>> see >>>>>>>> >> >> MAHOUT-593. This is suitable if you have a good decay of >>>>>>>> singular >>>>>>>> >> >> values (but if you don't it probably just means you have so > much >>>>>>>> noise >>>>>>>> >> >> that it masks the problem you are trying to solve in your > data). >>>>>>>> >> >> >>>>>>>> >> >> Current committed solution is not most efficient yet, but it >>>>>>>> should be >>>>>>>> >> >> quite capable. >>>>>>>> >> >> >>>>>>>> >> >> If you do, let me know how it went. >>>>>>>> >> >> >>>>>>>> >> >> thanks. >>>>>>>> >> >> -d >>>>>>>> >> >> >>>>>>>> >> >> On Thu, Mar 24, 2011 at 10:59 PM, Dmitriy Lyubimov < >>>>>>>> dlie...@gmail.com> >>>>>>>> >> >> wrote: >>>>>>>> >> >> > Are you sure your matrix is dense? >>>>>>>> >> >> > >>>>>>>> >> >> > On Thu, Mar 24, 2011 at 9:59 PM, Wei Li <wei.le...@gmail.com >> >>>>>>>> wrote: >>>>>>>> >> >> >> Hi All: >>>>>>>> >> >> >> >>>>>>>> >> >> >> is it possible to compute the SVD factorization for a >>>>>>>> 600,000 * >>>>>>>> >> >> >> 600,000 >>>>>>>> >> >> >> matrix using Mahout? >>>>>>>> >> >> >> >>>>>>>> >> >> >> I have got the OutOfMemory error when creating the >>>>>>>> DenseMatrix. >>>>>>>> >> >> >> >>>>>>>> >> >> >> Best >>>>>>>> >> >> >> Wei >>>>>>>> >> >> >> >>>>>>>> >> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> > >>>>>>>> > >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> > -- Lance Norskog goks...@gmail.com