If you just want to find the top eigenvalue / eigenvector you can do something like the Lanczos method. There is a description of a MapReduce based algorithm in Section 4.2 of [1]
[1] http://www.cs.cmu.edu/~ukang/papers/HeigenPAKDD2011.pdf On Thu, Aug 7, 2014 at 10:54 AM, Li Pu <l...@twitter.com.invalid> wrote: > @Miles, the latest SVD implementation in mllib is partially distributed. > Matrix-vector multiplication is computed among all workers, but the right > singular vectors are all stored in the driver. If your symmetric matrix is > n x n and you want the first k eigenvalues, you will need to fit n x k > doubles in driver's memory. Behind the scene, it calls ARPACK to compute > eigen-decomposition of A^T A. You can look into the source code for the > details. > > @Sean, the SVD++ implementation in graphx is not the canonical definition > of SVD. It doesn't have the orthogonality that SVD holds. But we might want > to use graphx as the underlying matrix representation for mllib.SVD to > address the problem of skewed entry distribution. > > > On Thu, Aug 7, 2014 at 10:51 AM, Evan R. Sparks <evan.spa...@gmail.com> > wrote: > >> Reza Zadeh has contributed the distributed implementation of >> (Tall/Skinny) SVD ( >> http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html), >> which is in MLlib (Spark 1.0) and a distributed sparse SVD coming in Spark >> 1.1. (https://issues.apache.org/jira/browse/SPARK-1782). If your data is >> sparse (which it often is in social networks), you may have better luck >> with this. >> >> I haven't tried the GraphX implementation, but those algorithms are often >> well-suited for power-law distributed graphs as you might see in social >> networks. >> >> FWIW, I believe you need to square elements of the sigma matrix from the >> SVD to get the eigenvalues. >> >> >> >> >> On Thu, Aug 7, 2014 at 10:20 AM, Sean Owen <so...@cloudera.com> wrote: >> >>> (-incubator, +user) >>> >>> If your matrix is symmetric (and real I presume), and if my linear >>> algebra isn't too rusty, then its SVD is its eigendecomposition. The >>> SingularValueDecomposition object you get back has U and V, both of >>> which have columns that are the eigenvectors. >>> >>> There are a few SVDs in the Spark code. The one in mllib is not >>> distributed (right?) and is probably not an efficient means of >>> computing eigenvectors if you really just want a decomposition of a >>> symmetric matrix. >>> >>> The one I see in graphx is distributed? I haven't used it though. >>> Maybe it could be part of a solution. >>> >>> >>> >>> On Thu, Aug 7, 2014 at 2:21 PM, yaochunnan <yaochun...@gmail.com> wrote: >>> > Our lab need to do some simulation on online social networks. We need >>> to >>> > handle a 5000*5000 adjacency matrix, namely, to get its largest >>> eigenvalue >>> > and corresponding eigenvector. Matlab can be used but it is >>> time-consuming. >>> > Is Spark effective in linear algebra calculations and transformations? >>> Later >>> > we would have 5000000*5000000 matrix processed. It seems emergent that >>> we >>> > should find some distributed computation platform. >>> > >>> > I see SVD has been implemented and I can get eigenvalues of a matrix >>> through >>> > this API. But when I want to get both eigenvalues and eigenvectors or >>> at >>> > least the biggest eigenvalue and the corresponding eigenvector, it >>> seems >>> > that current Spark doesn't have such API. Is it possible that I write >>> > eigenvalue decomposition from scratch? What should I do? Thanks a lot! >>> > >>> > >>> > Miles Yao >>> > >>> > ________________________________ >>> > View this message in context: How can I implement eigenvalue >>> decomposition >>> > in Spark? >>> > Sent from the Apache Spark User List mailing list archive at >>> Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> > > > -- > Li > @vrilleup >