Per o.a.m.utils.vectors.lucene.TFDFMapper, which is called from o.a.m.utils.vectors.lucene.Driver, the vectors created are instances of RandomAccessSparseVector
On Sun, Nov 21, 2010 at 9:28 AM, Mike Perry <mikeperrycan...@gmail.com> wrote: > Thanks Ted for the answer. > > "Should be sparse, but I can't say for sure." > > Could anybody confirm? in the quickstart-kmeans.sh script there's a step to > convert the data to SequenceFile format (seqdirectory) and then > a second step to convert the SequenceFiles to sparse vector format ( > seq2sparse). That's why I'm asking. > > > On Sat, Nov 20, 2010 at 3:45 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > >> On Sat, Nov 20, 2010 at 8:47 AM, Mike Perry <mikeperrycan...@gmail.com >> >wrote: >> >> > Hello all, >> > >> > Does the script to convert a Lucene index to Mahout vectors write >> sequence >> > files in sparse vector representation? my impression is that it doesn't >> but >> > I want to verify that. >> > >> >> Should be sparse, but I can't say for sure. >> >> >> > Also, SparseVectorsFromSequenceFiles is used to convert the vectors to >> > sparse format (I know about the seq2sparse option). Could someone point >> out >> > where in the code it actually constructs the sparse vectors? it seems to >> > me >> > that one of the methods in DictionaryVectorizer generates the vectors but >> I >> > couldn't >> > find where exactly. >> > >> >> Look for VectorWritable. >> >