Re: abbreviation diccionary format

Jörn Kottmann Tue, 10 Apr 2012 07:52:00 -0700

On 04/10/2012 04:44 PM, Joan Codina wrote:

But to train the system I only found that file... which is small.
http://opennlp.cvs.sourceforge.net/viewvc/opennlp/opennlp/src/test/resources/opennlp/tools/tokenize/token.train?view=markupwhich only contains 121 sentences. i don't know if this is enough orthere are other training annotated models

No, that is not enough. Get some training data set for the language youneed. Most of the data setsreferenced in the Corpora section can be used to train the tokenizer.These corpora are already tokenized

and can be de-tokenized into training data for the tokenizer.

Jörn

Re: abbreviation diccionary format

Reply via email to