Hi Alexandre "Unfortunately" model training for portugese language went without any problems
In example: pt-pos-tagger.txt === EVALUATION INFO === Evaluation-Score=0.9232609658839167 Training-Sample-Size=8710 Evaluation-Sample-Size=967 Training-Algorithm=MAXENT pt-lemmatizer.txt === EVALUATION INFO === Evaluation-Score=0.9815241470979176 Training-Sample-Size=8710 Evaluation-Sample-Size=967 Training-Algorithm=MAXENT Od: "Alexandre Rademaker" <[email protected]> Do: [email protected]; Wysłane: 12:25 Środa 2022-06-15 Temat: Re: Experiment: How good is quality of OpenNLP models for various languages. > > Just a reminding that errors in UD treebanks can be reported as issues in there repositories. As a UD treebank maintainer, Portuguese in my case, I would love to receive feedbacks such as these mentioned below. > > Alexandre > Sent from my iPhone > > > On 14 Jun 2022, at 06:39, [email protected] wrote: > > > > I observed that lemmatizer fails for some languages: > > german - Compound nouns are inconsistently lemmatized. Sometimes they are lemmatized to the full word, but sometimes they are lemmatized to their last word. In example: kundendienstzentrums => zentrum, geheimdienste => dienst > > It causes an enormous number of outcomes and lemmatizer fails with out of memory error. >
