On Aug 13, 2011, at 2:11 PM, Dmitriy Lyubimov wrote: > NP. > > thanks for testing it out. > > I would appreciate if you could let me know how it goes with non-full rank > decomposition and perhaps at larger scale. >
Sure thing. > One thing to keep in mind is that it projects it into m x k+p _dense_ > matrix, assuming that k+p is much less than non-zero elements in a sparse > row vector. If it is not the case, you actually would create more > computation, not less, with a random projection. One person tried to use it > with m= millions but rows were so sparse that there were only a handful (~10 > avg) non-zero items per row (somewhat typical for user ratings), but he > tried to compute actually hundreds of singular values which of course > created more intermediate work than something like Lanczos would probably > do. That's not a good application of this method. So this is a bit surprising: In my situation, the k would be relatively low < 20. Since I am working with text data, I suspect that the rows are pretty sparse, although I have not instrumented row non zero element distributions yet. Based on your notes, I was planning to set k + p = 500 (or less depending on width of matrix) so that I would get reasonably good singular vectors. I guess I will do some more tuning. > Another thing is also that you need to have good singular value decay in > your data, otherwise this methods would be surprisingly far from true > vectors (in my experiments). > I am not too sure off hand whether this is true for my dataset. > -d > > > On Sat, Aug 13, 2011 at 1:48 PM, Eshwaran Vijaya Kumar < > [email protected]> wrote: > >> Dmitriy, >> That sounds great. I eagerly await the patch. >> Thanks >> Esh >> On Aug 13, 2011, at 1:37 PM, Dmitriy Lyubimov wrote: >> >>> Ok, i got u0 working. >>> >>> The problem is of course that something called BBt job is to be coerced >> to >>> have 1 reducer (it's fine, every mapper won't yeld more than >>> upper-triangular matrix of k+p x k+p geometry, so even if you end up >> having >>> thousands of them, reducer would sum them up just fine. >>> >>> it worked before apparently because configuration hold 1 reducer by >> default >>> if not set explicitly, i am not quite sure if that's something in hadoop >> mr >>> client or mahout change that now precludes it from working. >>> >>> anyway, i got a patch (really a one-liner) and an example equivalent to >>> yours worked fine for me with 3 reducers. >>> >>> Also, in the tests, it also requests 3 reducers, but the reason it works >> in >>> tests and not in distributed mapred is because local mapred doesn't >> support >>> multiple reducers. I investigated this issue before and apparently there >>> were a couple of patches floating around but for some reason those >> changes >>> did not take hold in cdh3u0. >>> >>> I will publish patch in a jira shortly and will commit it Sunday-ish. >>> >>> Thanks. >>> -d >>> >>> >>> On Fri, Aug 5, 2011 at 7:06 PM, Eshwaran Vijaya Kumar < >>> [email protected]> wrote: >>> >>>> OK. So to add more info to this, I tried setting the number of reducers >> to >>>> 1 and now I don't get that particular error. The singular values and >> left >>>> and right singular vectors appear to be correct though (verified using >>>> Matlab). >>>> >>>> On Aug 5, 2011, at 1:55 PM, Eshwaran Vijaya Kumar wrote: >>>> >>>>> All, >>>>> I am trying to test Stochastic SVD and am facing some errors where it >>>> would be great if someone could clarifying what is going on. I am >> trying to >>>> feed the solver a DistributedRowMatrix with the exact same parameters >> that >>>> the test in LocalSSVDSolverSparseSequentialTest uses, i.e, Generate a >> 1000 >>>> X 100 DRM with SequentialSparseVectors and then ask for blockHeight 251, >> p >>>> (oversampling) = 60, k (rank) = 40. I get the following error: >>>>> >>>>> Exception in thread "main" java.io.IOException: Unexpected overrun in >>>> upper triangular matrix files >>>>> at >>>> >> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.loadUpperTriangularMatrix(SSVDSolver.java:471) >>>>> at >>>> >> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:268) >>>>> at com.mozilla.SSVDCli.run(SSVDCli.java:89) >>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >>>>> at com.mozilla.SSVDCli.main(SSVDCli.java:129) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> at >>>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:186) >>>>> >>>>> Also, I am using CDH3 with Mahout recompiled to work with CDH3 jars. >>>>> >>>>> Thanks >>>>> Esh >>>>> >>>> >>>> >> >>
