On 04/10/2013 10:43 PM, William Colen wrote:
Yes, that would be a good step.

But actually I was always talking about lexical dictionary of the POS
Tagger, which default XML implementation relies on is called POSDictionary,
the interface is named TagDictionary. Because it is an interface, to
implement it using Morfologic FSA was easy. I also created a Featurizer
component for my thesis, but since I was in a hurry, I did not follow
OpenNLP structure for that.

Yes I know, I thought it might be a nice solution to have an additional layer between the component and the actual dictionary implementation, in that case we would of course keep the current xml dictionary, but from your comment below I now got the impression that it
is not worth it, because it also makes things more complex.
It would probably be nice to have if the dictionary implementations we could use would be generic enough to be accessed via one common dictionary interface, but with Morfologik its probably not
the case.

I don't know if we can use the Morfologik FSA dictionary for the
conventional Dictionary class we have, which entries are multiple tokens.
This is used for the abbreviation dictionaries and for the Name Finder.
Maybe we can use FSA, but we would have to adapt.

We should do the same for the lemmatizer, and if we one day change our mind we could
still introduce a dictionary interface.

Jörn

Reply via email to