Hi Jörg, here are the two Jira-Tickets, as promised (one for stop lists and one for language models):
https://issues.apache.org/jira/browse/OPENNLP-659 (for this one, I wasn't sure which component it should be assigned to) https://issues.apache.org/jira/browse/OPENNLP-660 HTH. Cheers, Martin Am 23.02.2014 um 15:24 schrieb Jörn Kottmann <[email protected]>: > Hello, > > the current trunk version includes the Porter and Snowball stemmers. We > didn't develop the ourself > but redistribute them as part of OpenNLP. > It would be nice to add more stemmers, in case you need a certain one it > would be nice if you could > point it out, and we might be able to redistribute it as well. Or maybe just > implement it. > > We don't have stoplists, but I think it will be easy to change that. We could > probably use the ones from snowball. > > There is no language modeling, it would be nice to get a contribution there. > Maybe you are interested in implementing it? > > Anyway, it would be nice if you could open two ira issues to request stopword > lists and the language model. > > Jörn > > On 02/23/2014 02:35 PM, Martin Wunderlich wrote: >> Hi all, >> >> I recently started working with OpenNLP for a project in the area of text >> classification with neural networks. So far, OpenNLP is a great library and >> very useful. >> There are just three things that I haven't been able to find, but maybe they >> do exist: >> - language models: e.g. to create a bigram language model with relative and >> absolute frequencies from several texts >> - stemming: to reduce different word forms in inflected languages to a >> canonical root form >> - stoplist: to remove certain words (e.g. from the language model) that are >> deemed irrelevant >> >> Do these functions exist in OpenNLP? If not, can you recommend another >> library to complement these functions? >> >> Kind regards, >> >> Martin >> >> >
signature.asc
Description: Message signed with OpenPGP using GPGMail
