I want to multiply two large matrices (from csv files)using Spark and Scala and save output.I use the following code
val rows=file1.coalesce(1,false).map(x=>{ val line=x.split(delimiter).map(_.toDouble) Vectors.sparse(line.length, line.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0)) }) val rmat = new RowMatrix(rows) val dm=file2.coalesce(1,false).map(x=>{ val line=x.split(delimiter).map(_.toDouble) Vectors.dense(line) }) val ma = dm.map(_.toArray).take(dm.count.toInt) val localMat = Matrices.dense( dm.count.toInt, dm.take(1)(0).size, transpose(ma).flatten) // Multiply two matrices val s=rmat.multiply(localMat).rows s.map(x=>x.toArray.mkString(delimiter)).saveAsTextFile(OutputPath) } def transpose(m: Array[Array[Double]]): Array[Array[Double]] = { (for { c <- m(0).indices } yield m.map(_(c)) ).toArray } When I save file it takes more time and output file has very large in size.what is the optimized way to multiply two large files and save the output to a text file ?