Lemmatizer BUG

2016-12-05 Thread Damiano Porta
Hello, I am doing some tests with the lemmatizerME. It is returning a wrong word, a word that never occurs in the training data. Basically it is NOT an italian word :) The output is: [O, O, O, O, *R1trR0ae*] The code: try (InputStream in = new FileInputStream("/home/damiano/lemmas.bin")

Re: Lemmatizer BUG

2016-12-05 Thread Rodrigo Agerri
Hello, The String[] lemmatize(String[] toks, String[] tags) method will give you predicted "lemma class" which consists of the number of permutations required to go from the word form to the lemma. If the output is O that means that no permutation is required, namely, the lemma and the word form

Re: Lemmatizer BUG

2016-12-05 Thread Damiano Porta
Hello Rodrigo! Thank you so much! It works perfectly... but, what is the reason behind the use of the permuations? Why can we not have the lemma directly? Thanks for the clarification Damiano 2016-12-05 12:12 GMT+01:00 Rodrigo Agerri : > Hello, > > The String[] lemmatize(String[] toks, String[]

Re: Lemmatizer BUG

2016-12-05 Thread Rodrigo Agerri
Hello, The javadoc says that the implementation of the statistical lemmatizer is based on: http://grzegorz.chrupala.me/papers/phd-single.pdf Check Chapter 6. This paper summarizes greatly that chapter http://grzegorz.chrupala.me/papers/chrupala-etal-2008a/paper.pdf To cut a long story short,

Re: Lemmatizer BUG

2016-12-05 Thread Damiano Porta
Perfect! Thank you! 2016-12-05 15:46 GMT+01:00 Rodrigo Agerri : > Hello, > > The javadoc says that the implementation of the statistical lemmatizer is > based on: > > http://grzegorz.chrupala.me/papers/phd-single.pdf > > Check Chapter 6. > > This paper summarizes greatly that chapter > > http://