[ https://issues.apache.org/jira/browse/SPARK-21058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-21058. ------------------------------- Resolution: Not A Problem If there's a specific optimization for large, sparse matrices to discuss, I can reopen this. > potential SVD optimization > -------------------------- > > Key: SPARK-21058 > URL: https://issues.apache.org/jira/browse/SPARK-21058 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Affects Versions: 2.1.1 > Reporter: Vincent > > In the current implementation, computeSVD will compute SVD for matrix A by > computing AT*A first and svd on the Gramian matrix, we found that the Gramian > matrix computation is the hot spot of the overall SVD computation. While svd > on the Gramian matrix can benefit svd computation on the skinny matrix, for a > non-skinny matrix, it could also become a huge overhead. So, is it possible > to offer another option by computing svd on the original matrix instead of > the Gramian matrix? We can decide which way to go by the ratio between > numCols and numRows, or by simply settings from the user. > We have observed a handsome gain on a toy dataset by svd on the original > matrix instead of the Gramian matrix, if the proposal is acceptable, we will > start to work on the patch and gather more performance data. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org