Yes, you're right that it's quite on purpose to leave this API to Breeze, in the main. As you can see the Spark objects have already sprouted a few basic operations anyway; there's a slippery slope problem here. Why not addition, why not dot products, why not determinants, etc.
What about declaring a few simple implicit conversions between the MLlib and Breeze Vector classes? if you import them then you should be able to write a lot of the source code just as you imagine it, as if the Breeze methods were available on the Vector object in MLlib. On Tue, Aug 25, 2015 at 3:35 PM, Kristina Rogale Plazonic <kpl...@gmail.com> wrote: > Well, yes, the hack below works (that's all I have time for), but is not > satisfactory - it is not safe, and is verbose and very cumbersome to use, > does not separately deal with SparseVector case and is not complete either. > > My question is, out of hundreds of users on this list, someone must have > come up with a better solution - please? > > > import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV} > import org.apache.spark.mllib.linalg.Vectors > import org.apache.spark.mllib.linalg.{Vector =>SparkVector} > > def toBreeze(v:SparkVector) = BV(v.toArray) > > def fromBreeze(bv:BV[Double]) = Vectors.dense(bv.toArray) > > def add(v1:SparkVector, v2:SparkVector) = fromBreeze( toBreeze(v1) + > toBreeze(v2)) > > def subtract(v1:SparkVector, v2:SparkVector) = fromBreeze( toBreeze(v1) - > toBreeze(v2)) > > def scalarMultiply(a:Double, v:SparkVector) = fromBreeze( a*toBreeze(v1) ) > > > On Tue, Aug 25, 2015 at 9:41 AM, Sonal Goyal <sonalgoy...@gmail.com> wrote: >> >> From what I have understood, you probably need to convert your vector to >> breeze and do your operations there. Check >> stackoverflow.com/questions/28232829/addition-of-two-rddmllib-linalg-vectors >> >> On Aug 25, 2015 7:06 PM, "Kristina Rogale Plazonic" <kpl...@gmail.com> >> wrote: >>> >>> Hi all, >>> >>> I'm still not clear what is the best (or, ANY) way to add/subtract two >>> org.apache.spark.mllib.Vector objects in Scala. >>> >>> Ok, I understand there was a conscious Spark decision not to support >>> linear algebra operations in Scala and leave it to the user to choose a >>> linear algebra library. >>> >>> But, for any newcomer from R or Python, where you don't think twice about >>> adding two vectors, it is such a productivity shot in the foot to have to >>> write your own + operation. I mean, there is support in Spark for p-norm of >>> Vectors, for sqdist between two Vectors, but not for +/-? As I said, I'm a >>> newcomer to linear algebra in Scala and am not familiar with Breeze or >>> apache.commons - I am willing to learn, but would really benefit from >>> guidance from more experienced users. I am also not used to optimizing >>> low-level code and am sure that any hack I do will be just horrible. >>> >>> So, please, could somebody point me to a blog post, documentation, or >>> just patches for this really basic functionality. What do you do to get >>> around it? Am I the only one to have a problem? (And, would it really be so >>> onerous to add +/- to Spark? After all, even org.apache.spark.sql.Column >>> class does have +,-,*,/ ) >>> >>> My stupid little use case is to generate some toy data for Kmeans, and I >>> need to translate a Gaussian blob to another center (for streaming and >>> nonstreaming KMeans both). >>> >>> Many thanks! (I am REALLY embarassed to ask such a simple question...) >>> >>> Kristina > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org