Hi Alexandre
"Unfortunately" model training for portugese language went without any problems

In example:
pt-pos-tagger.txt
=== EVALUATION INFO ===
Evaluation-Score=0.9232609658839167
Training-Sample-Size=8710
Evaluation-Sample-Size=967
Training-Algorithm=MAXENT

pt-lemmatizer.txt
=== EVALUATION INFO ===
Evaluation-Score=0.9815241470979176
Training-Sample-Size=8710
Evaluation-Sample-Size=967
Training-Algorithm=MAXENT


Od: "Alexandre Rademaker" <[email protected]>
Do: [email protected]; 
Wysłane: 12:25 Środa 2022-06-15
Temat: Re: Experiment: How good is quality of OpenNLP models for various 
languages.

> 
> Just a reminding that errors in UD treebanks can be reported as issues
in there repositories. As a UD treebank maintainer, Portuguese in my case,
I would love to receive feedbacks such as these mentioned below. 
> 
> Alexandre 
> Sent from my iPhone
> 
> > On 14 Jun 2022, at 06:39, [email protected] wrote:
> > 
> > I observed that lemmatizer fails for some languages:
> > german - Compound nouns are inconsistently lemmatized. Sometimes
they are lemmatized to the full word, but sometimes they are lemmatized to
their last word. In example: kundendienstzentrums => zentrum, geheimdienste
=> dienst
> >             It causes an enormous number of outcomes and lemmatizer
fails with out of memory error.
> 


Reply via email to