> Spanish pos + lemmatizer using this approach. >> > > +1, it would be nice to have control over the dictionary, maybe we can > come up with > a format to store it in. That will allow us to easily include it in our > models > as a resource for feature generation and eliminates the dependency on > external libraries.
That would be great! The format should then take into account morphological features. > Of course, another method would be to re-implement John Carroll and >> colleagues' finite-state approach for English (and similar rule-based >> approaches for other languages) which removes the dependence on a >> dictionary. I will be exploring this further on. >> > > +1 > > We should define an interface which allows to use different > implementations like > we did for the other components. +1. It seems that we have european languages represented here. Do we have anybody from east? chinese? Would be nice to check them too.
