Re: Stemming, Stoplists and Language Models?

Alexandre Patry Sun, 23 Feb 2014 12:49:24 -0800

On 14-02-23 08:35 AM, Martin Wunderlich wrote:

Hi all,
I recently started working with OpenNLP for a project in the area oftext classification with neural networks. So far, OpenNLP is a greatlibrary and very useful.There are just three things that I haven't been able to find, butmaybe they do exist:- language models: e.g. to create a bigram language model withrelative and absolute frequencies from several texts- stemming: to reduce different word forms in inflected languages to acanonical root form- stoplist: to remove certain words (e.g. from the language model)that are deemed irrelevant
Do these functions exist in OpenNLP? If not, can you recommend anotherlibrary to complement these functions?

Lucene's analyzers-common [1] has stemming algorithms and stoplists formany languages (for examples, look at [2] and [3]) . It might be a goodstarting point.


Hope this help,

Alexandre

[1] http://lucene.apache.org/core/4_6_1/analyzers-common/index.html

[2]http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/en/EnglishAnalyzer.html[3]http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/fr/FrenchAnalyzer.html


Kind regards,

Martin

Re: Stemming, Stoplists and Language Models?

Reply via email to