This was not intentional. As I said I wanted to use the
POSTaggerTrainer with a tagset whose values would be word lemma.
Consequently instead of having thirty tag values, I had thirty
thousand distinct tag values... I did it by curiosity, the approach
works fine for predicting gender, number, person...

Here is an excerpt of my corpus
Il_il est_être vrai_vrai ,_, si_si l'on_on en_en croit_croire le_le
rapport_rapport Delors_Delors que_que c'_ce est_être un_un
organisme_organisme du_de#:#le même_même genre_genre que_que l'on_on
veut_vouloir créer_créer au_à#:#le bénéfice_bénéfice de_de l'_le
Europe_Europe tout_entière_tout#:#entière ._.

I open the following issue
https://issues.apache.org/jira/browse/OPENNLP-578

/Nicolas

On Mon, May 13, 2013 at 6:13 PM, Jörn Kottmann <[email protected]> wrote:
> On 05/13/2013 03:44 PM, Nicolas Hernandez wrote:
>>
>> I ve tried to use the postagger command to learn models of various
>> morphological features. Even if I know it is not adapted to, I also
>> try to build a model for lemma tagging....
>
>
> Looks like we do not support strings for features larger than 64KB, as
> pointed out
> this seems to be a bug in our serializer code. Anyway, why do you use such
> large
> strings for features? Is this intentional?
>
> Would you mind to open a jira issue for this?
>
> Thanks,
> Jörn



-- 
Dr. Nicolas Hernandez
Associate Professor (Maître de Conférences)
Université de Nantes - LINA CNRS UMR 6241
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
+33 (0)2 51 12 53 94
+33 (0)2 40 30 60 67

Reply via email to