Chirag, There isn't a fully baked answer to your needs, but there are components that can help you. For instance, the OnlineSummarizer can help you find a particular quantile. Iterating over the vector to fill that is easy enough:
For example: Vector v; // original data OnlineSummarizer s = new OnlineSummarizer(); for (Vector.Element e : v.all()) { s.add(e.get()); } // pick any cutoff you like double cutoff = s.quantile(0.99); Then you can use this cutoff to copy only the items you need: Vector r = new RandomAccessSparseVector(v.size()); for (Vector.Element e : v.all()) { double vi = e.get(); if (vi > cutoff) { r.set(e.index(), vi); } } Note that if you really want a sparse result, you really have to perform a selective copy because even if you set elements of a DenseVector to zero, you still will have the same amount of storage. Only by copying selectively to a new vector with the right type can you get the desired effect. On Sun, Mar 2, 2014 at 10:31 AM, Chirag Lakhani <clakh...@zaloni.com> wrote: > Hi, > > I was wondering if there is a simple way to sparsify a vector in Mahout. I > basically have an n-dimensional vector (currently a DenseVector) and I want > to develop a method that sparsifies it by keeping only the largest s values > of the vector and setting the rest to 0. Is there a simple solution to > this given all that is included in the Vector class or do I need to create > my own method? > > Chirag > > -- > > *Chirag Lakhani* > > Data Scientist > > Zaloni, Inc. | www.zaloni.com > > 633 Davis Dr., Suite 200 > > Durham, NC 27713 > e: clakh...@zaloni.com > p: 919.602.4965 x7020 >