We use a dense array to store the covariance matrix on the driver node. So its length is limited by the integer range, which is 65536 * 65536 (actually half). -Xiangrui
On Wed, May 13, 2015 at 1:57 AM, Sebastian Alfers <sebastian.alf...@googlemail.com> wrote: > Hello, > > > in order to compute a huge dataset, the amount of columns to calculate the > covariance matrix is limited: > > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L129 > > What is the reason behind this limitation and can it be extended? > > Greetings > > Sebastian --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org