Tfidf job is where the document frequency pruning is applied. Try increasing maxDFPercent to 100 %
On Wed, Aug 1, 2012 at 11:22 AM, Abramov Pavel <p.abra...@rambler-co.ru>wrote: > Hello! > > I have trouble running the example "seq2sparse" with TFIDF weights. My TF > vectors are Ok, while TFIDF vectors are 10 times smaller. Looks like > seq2sparse cuts my terms during TFxIDF step. Document1 in TF vector has 20 > terms, while Document1 in TFIDF vector > has only 2 terms. What is wrong? I spent 2 days finding the answer and > configuring seq2sparse parameters (( > > Thanks in advance! > > mahout seq2sparse -ow \ > -chunk 512 \ > --maxDFPercent 90 \ > --maxNGramSize 1 \ > --numReducers 128 \ > --minSupport 150 \ > -i --- \ > -o --- \ > -wt tfidf \ > --namedVector \ > -a org.apache.lucene.analysis.WhitespaceAnalyzer > > Pavel > >