On 04/10/2013 10:43 PM, William Colen wrote:
Yes, that would be a good step.
But actually I was always talking about lexical dictionary of the POS
Tagger, which default XML implementation relies on is called POSDictionary,
the interface is named TagDictionary. Because it is an interface, to
implement it using Morfologic FSA was easy. I also created a Featurizer
component for my thesis, but since I was in a hurry, I did not follow
OpenNLP structure for that.
Yes I know, I thought it might be a nice solution to have an additional
layer between
the component and the actual dictionary implementation, in that case we
would of course keep
the current xml dictionary, but from your comment below I now got the
impression that it
is not worth it, because it also makes things more complex.
It would probably be nice to have if the dictionary implementations we
could use would be
generic enough to be accessed via one common dictionary interface, but
with Morfologik its probably not
the case.
I don't know if we can use the Morfologik FSA dictionary for the
conventional Dictionary class we have, which entries are multiple tokens.
This is used for the abbreviation dictionaries and for the Name Finder.
Maybe we can use FSA, but we would have to adapt.
We should do the same for the lemmatizer, and if we one day change our
mind we could
still introduce a dictionary interface.
Jörn