Hi all, Thanks a lot for all the replies. I need to look into what Lucene provides and see how far I'll get. @Jörn, I will make sure to log the IRA tickets and think about making a contribution. I am not sure, if my programming skills are sufficient and I'd need to look into the source code, but I'll definitely check it out when / if time allows.
Cheers, Martin Am 23.02.2014 um 15:24 schrieb Jörn Kottmann <[email protected]>: > Hello, > > the current trunk version includes the Porter and Snowball stemmers. We > didn't develop the ourself > but redistribute them as part of OpenNLP. > It would be nice to add more stemmers, in case you need a certain one it > would be nice if you could > point it out, and we might be able to redistribute it as well. Or maybe just > implement it. > > We don't have stoplists, but I think it will be easy to change that. We could > probably use the ones from snowball. > > There is no language modeling, it would be nice to get a contribution there. > Maybe you are interested in implementing it? > > Anyway, it would be nice if you could open two ira issues to request stopword > lists and the language model. > > Jörn > > On 02/23/2014 02:35 PM, Martin Wunderlich wrote: >> Hi all, >> >> I recently started working with OpenNLP for a project in the area of text >> classification with neural networks. So far, OpenNLP is a great library and >> very useful. >> There are just three things that I haven't been able to find, but maybe they >> do exist: >> - language models: e.g. to create a bigram language model with relative and >> absolute frequencies from several texts >> - stemming: to reduce different word forms in inflected languages to a >> canonical root form >> - stoplist: to remove certain words (e.g. from the language model) that are >> deemed irrelevant >> >> Do these functions exist in OpenNLP? If not, can you recommend another >> library to complement these functions? >> >> Kind regards, >> >> Martin >> >> >
signature.asc
Description: Message signed with OpenPGP using GPGMail
