Training a TokenizerME model

Zahurul Islam Mon, 02 Mar 2015 05:38:37 -0800

Dear All,
I would like to train a TokenizerME model by following the instruction in
the 1.5.3 manual (
https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.tokenizer.training.api
).


A portion of the training corpus:

Ich erkläre die am Freitag<SPLIT>, dem Dezember unterbrochene
Sitzungsperiode des Europäischen Parlaments für wiederaufgenommen<SPLIT>,
wünsche Ihnen nochmals alles Gute zum Jahreswechsel und hoffe<SPLIT>, daß
Sie schöne Ferien hatten .

Wie Sie feststellen konnten , ist der gefürchtete " Millenium-Bug " nicht
eingetreten .

Doch sind Bürger einiger unserer Mitgliedstaaten Opfer von schrecklichen
Naturkatastrophen geworden .

I use the following command:

opennlp TokenizerTrainer -model de-token.bin -lang de -data
de-token.train -encoding UTF-8

The tokenizer using the newly trained model outputs same as the input
string. Any idea, why the trained model is not working? Any help will be
appreciated. Thanks in advance.

Best Regards,

Zahurul

Training a TokenizerME model

Reply via email to