I'm already using the SparseVector class.

~200 labels


On Sun, Apr 27, 2014 at 12:26 AM, Xiangrui Meng <men...@gmail.com> wrote:

> How many labels does your dataset have? -Xiangrui
>
> On Sat, Apr 26, 2014 at 6:03 PM, DB Tsai <dbt...@stanford.edu> wrote:
> > Which version of mllib are you using? For Spark 1.0, mllib will
> > support sparse feature vector which will improve performance a lot
> > when computing the distance between points and centroid.
> >
> > Sincerely,
> >
> > DB Tsai
> > -------------------------------------------------------
> > My Blog: https://www.dbtsai.com
> > LinkedIn: https://www.linkedin.com/in/dbtsai
> >
> >
> > On Sat, Apr 26, 2014 at 5:49 AM, John King <usedforprinting...@gmail.com>
> wrote:
> >> I'm just wondering are the SparkVector calculations really taking into
> >> account the sparsity or just converting to dense?
> >>
> >>
> >> On Fri, Apr 25, 2014 at 10:06 PM, John King <
> usedforprinting...@gmail.com>
> >> wrote:
> >>>
> >>> I've been trying to use the Naive Bayes classifier. Each example in the
> >>> dataset is about 2 million features, only about 20-50 of which are
> non-zero,
> >>> so the vectors are very sparse. I keep running out of memory though,
> even
> >>> for about 1000 examples on 30gb RAM while the entire dataset is 4
> million
> >>> examples. And I would also like to note that I'm using the sparse
> vector
> >>> class.
> >>
> >>
>

Reply via email to