Hi All, Relatively new to spark and scala, here.
I'm trying to solve a simple linear system of A*x = b where A is a sparse matrix, and x and b are dense vectors. Standard fare, really. But the solutions I've found from examples and tests don't seem efficient; I think the use case I need is falling between the cracks (or I'm missing something). We have a barely parallel problem of medium sized data: N = 10k-100k elements and the A matrix is a sparse NxN matrix (density of 0.001-0.003). Currently, these can be solved on single processor machines. Ideally, I'm looking for a reasonably fast method that can handle larger data (N will grow with time), and scales with more cores / machines. I've looked at many examples on the web and archives, and apologies if I missed something. I've already tried the LinearRegressionWithSGD as well as an SVD approach, and both were slow and required arbitrary tuning (stepsize, k principle vectors, etc). (This wasn't unexpected, but gave me a baseline. Both approaches do more work than is necessary in this case.) I hand rolled a simple conjugate gradient as well: https://gist.github.com/cmcbride/7b8551400da179f24345 (all comments / headkicks / improvements are welcome) This is one of several approaches I tried, but utilizes an IndexedRowMatrix that uses sparse vectors for the rows, and a hand rolled Matrix-vector multiplication. I'm being a little pedantic using tuples to ensure proper ordering of the output. In any case, all of these are waaay slower (10x-100x slower or more) than a single threaded scala solution (conjugate gradient approach) currently in use. I'd obviously like to improve that. Thanks! Cameron