Re: Adding/subtracting org.apache.spark.mllib.linalg.Vector in Scala?

Sean Owen Tue, 25 Aug 2015 08:14:06 -0700

Yes, you're right that it's quite on purpose to leave this API to
Breeze, in the main. As you can see the Spark objects have already
sprouted a few basic operations anyway; there's a slippery slope
problem here. Why not addition, why not dot products, why not
determinants, etc.


What about declaring a few simple implicit conversions between the
MLlib and Breeze Vector classes? if you import them then you should be
able to write a lot of the source code just as you imagine it, as if
the Breeze methods were available on the Vector object in MLlib.

On Tue, Aug 25, 2015 at 3:35 PM, Kristina Rogale Plazonic
<kpl...@gmail.com> wrote:
> Well, yes, the hack below works (that's all I have time for), but is not
> satisfactory - it is not safe, and is verbose and very cumbersome to use,
> does not separately deal with SparseVector case and is not complete either.
>
> My question is, out of hundreds of users on this list, someone must have
> come up with a better solution - please?
>
>
> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
> import org.apache.spark.mllib.linalg.Vectors
> import org.apache.spark.mllib.linalg.{Vector =>SparkVector}
>
> def toBreeze(v:SparkVector) = BV(v.toArray)
>
> def fromBreeze(bv:BV[Double]) = Vectors.dense(bv.toArray)
>
> def add(v1:SparkVector, v2:SparkVector) = fromBreeze( toBreeze(v1) +
> toBreeze(v2))
>
> def subtract(v1:SparkVector, v2:SparkVector) = fromBreeze( toBreeze(v1) -
> toBreeze(v2))
>
> def scalarMultiply(a:Double, v:SparkVector) = fromBreeze( a*toBreeze(v1) )
>
>
> On Tue, Aug 25, 2015 at 9:41 AM, Sonal Goyal <sonalgoy...@gmail.com> wrote:
>>
>> From what I have understood, you probably need to convert your vector to
>> breeze and do your operations there. Check
>> stackoverflow.com/questions/28232829/addition-of-two-rddmllib-linalg-vectors
>>
>> On Aug 25, 2015 7:06 PM, "Kristina Rogale Plazonic" <kpl...@gmail.com>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I'm still not clear what is the best (or, ANY) way to add/subtract two
>>> org.apache.spark.mllib.Vector objects in Scala.
>>>
>>> Ok, I understand there was a conscious Spark decision not to support
>>> linear algebra operations in Scala and leave it to the user to choose a
>>> linear algebra library.
>>>
>>> But, for any newcomer from R or Python, where you don't think twice about
>>> adding two vectors, it is such a productivity shot in the foot to have to
>>> write your own + operation. I mean, there is support in Spark for p-norm of
>>> Vectors, for sqdist between two Vectors, but not for +/-? As I said, I'm a
>>> newcomer to linear algebra in Scala and am not familiar with Breeze or
>>> apache.commons - I am willing to learn, but would really benefit from
>>> guidance from more experienced users. I am also not used to optimizing
>>> low-level code and am sure that any hack I do will be just horrible.
>>>
>>> So, please, could somebody point me to a blog post, documentation, or
>>> just patches for this really basic functionality. What do you do to get
>>> around it? Am I the only one to have a problem? (And, would it really be so
>>> onerous to add +/- to Spark? After all, even org.apache.spark.sql.Column
>>> class does have +,-,*,/  )
>>>
>>> My stupid little use case is to generate some toy data for Kmeans, and I
>>> need to translate a Gaussian blob to another center (for streaming and
>>> nonstreaming KMeans both).
>>>
>>> Many thanks! (I am REALLY embarassed to ask such a simple question...)
>>>
>>> Kristina
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Adding/subtracting org.apache.spark.mllib.linalg.Vector in Scala?

Reply via email to