On 14-02-23 08:35 AM, Martin Wunderlich wrote:
Hi all,

I recently started working with OpenNLP for a project in the area of text classification with neural networks. So far, OpenNLP is a great library and very useful. There are just three things that I haven't been able to find, but maybe they do exist: - language models: e.g. to create a bigram language model with relative and absolute frequencies from several texts - stemming: to reduce different word forms in inflected languages to a canonical root form - stoplist: to remove certain words (e.g. from the language model) that are deemed irrelevant

Do these functions exist in OpenNLP? If not, can you recommend another library to complement these functions?
Lucene's analyzers-common [1] has stemming algorithms and stoplists for many languages (for examples, look at [2] and [3]) . It might be a good starting point.

Hope this help,

Alexandre

[1] http://lucene.apache.org/core/4_6_1/analyzers-common/index.html
[2] http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/en/EnglishAnalyzer.html [3] http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/fr/FrenchAnalyzer.html

Kind regards,

Martin



Reply via email to