I think createDictionaryChunks is the first thing that runs inside of
createTermFrequencyVectors. It takes the input from
DocumentProcessor.tokenizeDocuments, which outputs Text, StringTuple.
So, I would suspect you would need Text, StringTuple as inputs.See
Hi all. I've got some documents described by binary features with
integer ids, and i want to read them into sparse mahout vectors to do
tfidf weighting and clustering. I do not want to paste them back
together and run a Lucene tokenizer. What's the clean way to do this?
I'm thinking that I