On 04/10/2013 12:06 PM, Rodrigo Agerri wrote:
The flexibility of the design would require to isolate the functions
that access each dictionary for each language (including tagset
mappings) from the lemmatization functionality itself. For example,
JWNL only works for English original WordNet. I am working on a
Spanish pos + lemmatizer using this approach.

+1, it would be nice to have control over the dictionary, maybe we can come up with a format to store it in. That will allow us to easily include it in our models as a resource for feature generation and eliminates the dependency on external libraries.

Of course, another method would be to re-implement John Carroll and
colleagues'  finite-state approach for English (and similar rule-based
approaches for other languages) which removes the dependence on a
dictionary. I will be exploring this further on.

+1

We should define an interface which allows to use different implementations like
we did for the other components.

Jörn

Reply via email to