Hello,

the current trunk version includes the Porter and Snowball stemmers. We didn't develop the ourself
but redistribute them as part of OpenNLP.
It would be nice to add more stemmers, in case you need a certain one it would be nice if you could point it out, and we might be able to redistribute it as well. Or maybe just implement it.

We don't have stoplists, but I think it will be easy to change that. We could probably use the ones from snowball.

There is no language modeling, it would be nice to get a contribution there. Maybe you are interested in implementing it?

Anyway, it would be nice if you could open two ira issues to request stopword lists and the language model.

Jörn

On 02/23/2014 02:35 PM, Martin Wunderlich wrote:
Hi all,

I recently started working with OpenNLP for a project in the area of text classification with neural networks. So far, OpenNLP is a great library and very useful. There are just three things that I haven't been able to find, but maybe they do exist: - language models: e.g. to create a bigram language model with relative and absolute frequencies from several texts - stemming: to reduce different word forms in inflected languages to a canonical root form - stoplist: to remove certain words (e.g. from the language model) that are deemed irrelevant

Do these functions exist in OpenNLP? If not, can you recommend another library to complement these functions?

Kind regards,

Martin



Reply via email to