Hello, I used morfologik and LanguageTool for grammar correction. It can be tricky to create and re-create the binary dictionaries, although it is true that once is created the speed is very good.
In any case, that would also create a dependence on morfologik for creating and accessing the dictionaries. Cheers, Rodrigo On Wed, Apr 10, 2013 at 2:34 PM, William Colen <[email protected]> wrote: > Hi, > > +1 for a lemmatizer API > > For my Master's project I created a lemma dictionary, which keys were the > [token + POS tag] and the value one or more lemmas. > > To store and access the entries I used a very nice Java tool available > under BSD license that is part of the Morfologik tool ( > http://sourceforge.net/projects/morfologik). This tool encodes the > dictionary in a finite-state automata, allowing a very efficient access and > a compact dictionary. > The tool also provide a efficient way of encoding and accessing lexical > dictionaries. > > The LanguageTools members wrote a tutorial on how to use Morfologik for > this: http://wiki.languagetool.org/developing-a-tagger-dictionary > > > > On Wed, Apr 10, 2013 at 9:02 AM, Rodrigo Agerri <[email protected]>wrote: > >> On Wed, Apr 10, 2013 at 1:00 PM, Jörn Kottmann <[email protected]> >> wrote:> >> > >> > +1, it would be nice to have control over the dictionary, maybe we can >> come >> > up with >> > a format to store it in. That will allow us to easily include it in our >> > models >> > as a resource for feature generation and eliminates the dependency on >> > external libraries. >> >> I do not know yet which dictionary format will be best, but I can try >> to come up with a proposal independent of WordNet or other third party >> resources, when I have it working, and then discuss it. >> >> > >> > +1 >> > >> > We should define an interface which allows to use different >> implementations >> > like >> > we did for the other components. >> >> OK. >> >> Cheers, >> >> Rodrigo >>
