They moved to github, here is the new link:
https://github.com/morfologik
Jörn
On 04/10/2013 02:34 PM, William Colen wrote:
Hi,
+1 for a lemmatizer API
For my Master's project I created a lemma dictionary, which keys were the
[token + POS tag] and the value one or more lemmas.
To store and access the entries I used a very nice Java tool available
under BSD license that is part of the Morfologik tool (
http://sourceforge.net/projects/morfologik). This tool encodes the
dictionary in a finite-state automata, allowing a very efficient access and
a compact dictionary.
The tool also provide a efficient way of encoding and accessing lexical
dictionaries.
The LanguageTools members wrote a tutorial on how to use Morfologik for
this: http://wiki.languagetool.org/developing-a-tagger-dictionary
On Wed, Apr 10, 2013 at 9:02 AM, Rodrigo Agerri <[email protected]>wrote:
On Wed, Apr 10, 2013 at 1:00 PM, Jörn Kottmann <[email protected]>
wrote:>
+1, it would be nice to have control over the dictionary, maybe we can
come
up with
a format to store it in. That will allow us to easily include it in our
models
as a resource for feature generation and eliminates the dependency on
external libraries.
I do not know yet which dictionary format will be best, but I can try
to come up with a proposal independent of WordNet or other third party
resources, when I have it working, and then discuss it.
+1
We should define an interface which allows to use different
implementations
like
we did for the other components.
OK.
Cheers,
Rodrigo