Sparse Vectors

2010-11-20 Thread Mike Perry
t the seq2sparse option). Could someone point out where in the code it actually constructs the sparse vectors? it seems to me that one of the methods in DictionaryVectorizer generates the vectors but I couldn't find where exactly. Many thanks guys! MIke

Re: Sparse Vectors

2010-11-20 Thread Ted Dunning
ut I can't say for sure. > Also, SparseVectorsFromSequenceFiles is used to convert the vectors to > sparse format (I know about the seq2sparse option). Could someone point out > where in the code it actually constructs the sparse vectors? it seems to > me > that one of the me

Re: Sparse Vectors

2010-11-21 Thread Mike Perry
gt; sparse format (I know about the seq2sparse option). Could someone point > out > > where in the code it actually constructs the sparse vectors? it seems to > > me > > that one of the methods in DictionaryVectorizer generates the vectors but > I > > couldn't > > find where exactly. > > > > Look for VectorWritable. >

Re: Sparse Vectors

2010-11-21 Thread Drew Farris
quence >> > files in sparse vector representation? my impression is that it doesn't >> but >> > I want to verify that. >> > >> >> Should be sparse, but I can't say for sure. >> >> >> > Also, SparseVectorsFromSequenceFiles is used to

Re: Sparse Vectors

2010-11-22 Thread Mike Perry
t > >> but > >> > I want to verify that. > >> > > >> > >> Should be sparse, but I can't say for sure. > >> > >> > >> > Also, SparseVectorsFromSequenceFiles is used to convert the vectors to > >&

Re: Similarity between sparse vectors

2011-07-15 Thread Sean Owen
This is simply Euclidean distance squared. Take the square root if you need the simple Euclidean distance. On Fri, Jul 15, 2011 at 12:36 PM, marco turchi wrote: > Dear All, > I'm a newcomer in Mahout and I'm try to compute the cosine similarity > between two sparse vectors. &

Re: Similarity between sparse vectors

2011-07-15 Thread marco turchi
Hi thanks a lot I have also another problem ( :-) ). As I wrote in the previous email, I'm using the RandomAccessSparseVector representation to store sparse vectors. I need to sum some of them together, so I use the method plus but it seems that it requires the same vector cardinality. I se

Re: Similarity between sparse vectors

2011-07-15 Thread Sean Owen
gests, the implementation you use is for sparse vectors, meaning dimensions without value have no representation. It would be a pretty poor sparse implementation if these were not true. So, no, the cardinality has no direct effect on memory. On Fri, Jul 15, 2011 at 1:00 PM, marco turchi wrote:

Re: Similarity between sparse vectors

2011-07-15 Thread marco turchi
ial > size" of a list. If your'e dealing with vectors that have a > potentially unbounded maximum dimension, use Integer.MAX_VALUE. > > As the name suggests, the implementation you use is for sparse > vectors, meaning dimensions without value have no representation. It >