Unstemming is pretty simple. Just build an unstemming dictionary based on seeing what word forms have lead to a stemmed form. Include frequencies.
When unstemming in the context of a document, pick the most popular (corpus-wide) version that actually appears in the document. On Fri, Aug 3, 2012 at 9:23 AM, Pat Ferrel <pat.fer...@gmail.com> wrote: > We do what Ted describes by tossing frequently used terms with the IDF > max, tossing stop words and stemming with a lucene analyzer. The stemming > makes the tags less readable for sure but without it the near duplicate > terms make for a strange looking tag list. With or without stemming the top > TFIDF terms work rather well for tags. >