Unstemming is pretty simple.  Just build an unstemming dictionary based on
seeing what word forms have lead to a stemmed form.  Include frequencies.

When unstemming in the context of a document, pick the most popular
(corpus-wide) version that actually appears in the document.

On Fri, Aug 3, 2012 at 9:23 AM, Pat Ferrel <pat.fer...@gmail.com> wrote:

> We do what Ted describes by tossing frequently used terms with the IDF
> max, tossing stop words and stemming with a lucene analyzer. The stemming
> makes the tags less readable for sure but without it the near duplicate
> terms make for a strange looking tag list. With or without stemming the top
> TFIDF terms work rather well for tags.
>

Reply via email to