I've run into this before. The EigenValueDecomposition creates a Java Array with 2*k*n elements. The Java Array is indexed with a native integer type, so 2*k*n cannot exceed Integer.MAX_VALUE values.
The array is created here: https://github.com/apache/spark/blob/master/mllib/src/ main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala#L84 If you remove the requirement that 2*k*n<MAX_VALUE statement, it would fail with java.lang.NegativeArraySizeException. More here on this issue here: https://issues.apache.org/jira/browse/SPARK-5656 On Tue, Sep 19, 2017 at 9:49 AM, Alexander Ovcharenko <shurik....@gmail.com> wrote: > Hello guys, > > While trying to compute SVD using computeSVD() function, i am getting the > following warning with the follow up exception: > 17/09/14 12:29:02 WARN RowMatrix: computing svd with k=49865 and n=191077, > please check necessity > IllegalArgumentException: u'requirement failed: k = 49865 and/or n = > 191077 are too large to compute an eigendecomposition' > > When I try to compute first 3000 singular values, I'm getting several > following warnings every second: > 17/09/14 13:43:38 WARN TaskSetManager: Stage 4802 contains a task of very > large size (135 KB). The maximum recommended task size is 100 KB. > > The matrix size is 49865 x 191077 and all the singular values are needed. > > Is there a way to lift that limit and be able to compute whatever number > of singular values? > > Thank you. > > >