On Wed, Apr 10, 2013 at 11:43 AM, Jörn Kottmann <[email protected]> wrote: > On 04/10/2013 11:27 AM, Rodrigo Agerri wrote: >> >> I have put together a very simple English lemmatizer using JWNL as >> part of a POS module based on opennlp API. The lemmatizer uses the >> constructor of JWNLDictionary in opennlp coref package. >> >> If there is interest in explicitly providing an English lemmatizer in >> the project I would not mind providing the code. > > > Yes, it would be nice to have a lemmatizer component in OpenNLP, the design > should be flexible enough so we can extend it with lemmatizers for other > languages than English later. > > As far as I remember is the wordnet approach is a dictionary lookup with > the token and its pos tag, right? >
Yes, postag and token lookup. The flexibility of the design would require to isolate the functions that access each dictionary for each language (including tagset mappings) from the lemmatization functionality itself. For example, JWNL only works for English original WordNet. I am working on a Spanish pos + lemmatizer using this approach. Of course, another method would be to re-implement John Carroll and colleagues' finite-state approach for English (and similar rule-based approaches for other languages) which removes the dependence on a dictionary. I will be exploring this further on. Cheers, Rodrigo
