Hello,
the current trunk version includes the Porter and Snowball stemmers. We
didn't develop the ourself
but redistribute them as part of OpenNLP.
It would be nice to add more stemmers, in case you need a certain one it
would be nice if you could
point it out, and we might be able to redistribute it as well. Or maybe
just implement it.
We don't have stoplists, but I think it will be easy to change that. We
could probably use the ones from snowball.
There is no language modeling, it would be nice to get a contribution
there. Maybe you are interested in implementing it?
Anyway, it would be nice if you could open two ira issues to request
stopword lists and the language model.
Jörn
On 02/23/2014 02:35 PM, Martin Wunderlich wrote:
Hi all,
I recently started working with OpenNLP for a project in the area of
text classification with neural networks. So far, OpenNLP is a great
library and very useful.
There are just three things that I haven't been able to find, but
maybe they do exist:
- language models: e.g. to create a bigram language model with
relative and absolute frequencies from several texts
- stemming: to reduce different word forms in inflected languages to a
canonical root form
- stoplist: to remove certain words (e.g. from the language model)
that are deemed irrelevant
Do these functions exist in OpenNLP? If not, can you recommend another
library to complement these functions?
Kind regards,
Martin