2014-02-23 15:24 GMT+01:00 Jörn Kottmann <[email protected]>: > Hello, > > the current trunk version includes the Porter and Snowball stemmers. We > didn't develop the ourself > but redistribute them as part of OpenNLP. > It would be nice to add more stemmers, in case you need a certain one it > would be nice if you could > point it out, and we might be able to redistribute it as well. Or maybe > just implement it. > > We don't have stoplists, but I think it will be easy to change that. We > could probably use the ones from snowball. > > There is no language modeling, it would be nice to get a contribution > there.
I have implemented a very simple set of nlp tools at [1], with implementations for ngrams [2] and language modeling [3] tasks too. I'd be happy to donate it to Apache OpenNLP if the community is interested. > Maybe you are interested in implementing it? > > Anyway, it would be nice if you could open two ira issues to request > stopword lists and the language model. Regards, Tommaso [1] : https://github.com/tteofili/nlp-utils [2] : https://github.com/tteofili/nlp-utils/blob/master/src/main/java/com/github/tteofili/nlputils/ngram/NGramUtils.java [3] : https://github.com/tteofili/nlp-utils/tree/master/src/main/java/com/github/tteofili/nlputils/languagemodel > > > Jörn > > > On 02/23/2014 02:35 PM, Martin Wunderlich wrote: > >> Hi all, >> >> I recently started working with OpenNLP for a project in the area of text >> classification with neural networks. So far, OpenNLP is a great library and >> very useful. >> There are just three things that I haven't been able to find, but maybe >> they do exist: >> - language models: e.g. to create a bigram language model with relative >> and absolute frequencies from several texts >> - stemming: to reduce different word forms in inflected languages to a >> canonical root form >> - stoplist: to remove certain words (e.g. from the language model) that >> are deemed irrelevant >> >> Do these functions exist in OpenNLP? If not, can you recommend another >> library to complement these functions? >> >> Kind regards, >> >> Martin >> >> >> >
