Tfidf job is where the document frequency pruning is applied. Try
increasing maxDFPercent to 100 %

On Wed, Aug 1, 2012 at 11:22 AM, Abramov Pavel <p.abra...@rambler-co.ru>wrote:

> Hello!
>
> I have trouble running the example "seq2sparse" with TFIDF weights. My TF
> vectors are Ok, while TFIDF vectors are 10 times smaller. Looks like
> seq2sparse cuts my terms during TFxIDF step. Document1 in TF vector has 20
> terms, while Document1 in TFIDF vector
>  has only 2 terms. What is wrong? I spent 2 days finding the answer and
> configuring seq2sparse parameters ((
>
> Thanks in advance!
>
> mahout seq2sparse -ow  \
> -chunk 512 \
> --maxDFPercent 90 \
> --maxNGramSize 1 \
> --numReducers 128 \
> --minSupport 150 \
> -i --- \
> -o --- \
> -wt tfidf \
> --namedVector \
> -a org.apache.lucene.analysis.WhitespaceAnalyzer
>
> Pavel
>
>

Reply via email to