Hi Jörg, 

here are the two Jira-Tickets, as promised (one for stop lists and one for 
language models): 

https://issues.apache.org/jira/browse/OPENNLP-659 (for this one, I wasn't sure 
which component it should be assigned to)
https://issues.apache.org/jira/browse/OPENNLP-660

HTH.

Cheers, 

Martin


Am 23.02.2014 um 15:24 schrieb Jörn Kottmann <[email protected]>:

> Hello,
> 
> the current trunk version includes the Porter and Snowball stemmers. We 
> didn't develop the ourself
> but redistribute them as part of OpenNLP.
> It would be nice to add more stemmers, in case you need a certain one it 
> would be nice if you could
> point it out, and we might be able to redistribute it as well. Or maybe just 
> implement it.
> 
> We don't have stoplists, but I think it will be easy to change that. We could 
> probably use the ones from snowball.
> 
> There is no language modeling, it would be nice to get a contribution there. 
> Maybe you are interested in implementing it?
> 
> Anyway, it would be nice if you could open two ira issues to request stopword 
> lists and the language model.
> 
> Jörn
> 
> On 02/23/2014 02:35 PM, Martin Wunderlich wrote:
>> Hi all,
>> 
>> I recently started working with OpenNLP for a project in the area of text 
>> classification with neural networks. So far, OpenNLP is a great library and 
>> very useful.
>> There are just three things that I haven't been able to find, but maybe they 
>> do exist:
>> - language models: e.g. to create a bigram language model with relative and 
>> absolute frequencies from several texts
>> - stemming: to reduce different word forms in inflected languages to a 
>> canonical root form
>> - stoplist: to remove certain words (e.g. from the language model) that are 
>> deemed irrelevant
>> 
>> Do these functions exist in OpenNLP? If not, can you recommend another 
>> library to complement these functions?
>> 
>> Kind regards,
>> 
>> Martin
>> 
>> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to