Thank you Karl, apologies all for the spam
On Fri, Mar 1, 2013 at 11:10 AM, Karl Wright <[email protected]> wrote: > I think you want the Mahout list. This is the ManifoldCF list. > > Karl > > On Fri, Mar 1, 2013 at 5:51 AM, Colum Foley <[email protected]> wrote: >> Hi, >> >> I am trying to store Mahout RandomAccessSparseVector using >> elephant-bird and pig. The data is of the form >> key(text),value(RandomAccessSparseVector). when I run pig describe it >> presents the following: >> >> pair: {key: int,val: (cardinality: int,entries: {entry: (index: >> int,value: double)})} >> >> My problem is that when I try to store tuples using elephant-bird's >> SequenceFileStorage as follows: >> >> store clusteredOut into 'logsvectors.dat' using >> com.twitter.elephantbird.pig.store.SequenceFileStorage ( >> '-c com.twitter.elephantbird.pig.util.TextConverter', >> '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter -- >> -sparse' >> ); >> >> It runs successfully but when I examine the resulting Sequencefile all >> the vectors are empty. >> >> On the other hand, if I run the following instead: >> >> store clusteredOut into 'logsvectors.dat' using >> com.twitter.elephantbird.pig.store.SequenceFileStorage (); >> >> ie do not specify the types of the key or value. >> >> The vectors are non-empty but are of type text..and this causes my >> clustering algorithm to fail(as they are expecting VectorWritable). >> >> So my problem is that I need to output in VectorFileFormat, but when I >> do the resulting vectors are empty. >> >> Anyone else have experience with this issue? >> >> Many thanks, >> Colum
