Or you can use the factory method `Vectors.sparse`:

val sv = Vectors.sparse(numProducts, productIds.map(x => (x, 1.0)))

where numProducts should be the largest product id plus one.

Best,
Xiangrui

On Mon, Sep 15, 2014 at 12:46 PM, Chris Gore <cdg...@cdgore.com> wrote:
> Hi Sameer,
>
> MLLib uses Breezeā€™s vector format under the hood.  You can use that.
> http://www.scalanlp.org/api/breeze/index.html#breeze.linalg.SparseVector
>
> For example:
>
> import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
>
> val numClasses = classes.distinct.count.toInt
>
> val userWithClassesAsSparseVector = rows.map(x => (x.userID, new
> BSV[Double](x.classIDs.sortWith(_ < _),
> Seq.fill(x.classIDs.length)(1.0).toArray,
> numClasses).asInstanceOf[BV[Double]]))
>
> Chris
>
> On Sep 15, 2014, at 11:28 AM, Sameer Tilak <ssti...@live.com> wrote:
>
> Hi All,
> I have transformed the data into following format: First column is user id,
> and then all the other columns are class ids. For a user only class ids that
> appear in this row have value 1 and others are 0.  I need to crease a sparse
> vector from this. Does the API for creating a sparse vector that can
> directly support this format?
>
> User id    Product class ids
>
> 2622572 145447 1620 13421 28565 285556 293 4553 67261 130 3646 1671 18806
> 183576 3286 51715 57671 57476
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to