OpenNLP Document Categorizer training format

Ajith Ganesan Wed, 15 Jan 2014 06:55:39 -0800

In the OpenNLP documentation for 1.5.3, for Document Categorizer, it states
and I quote


"The Document Categorizer can be trained on annotated training material.
The data can be in OpenNLP Document Categorizer training format. This is
one document per line, containing category and text separated by a
whitespace. *Other formats can also be available.*"

In what other formats training file can be prepared ?
Where can I find that information ?
Also, in the format explained it states one Document per line. That means
all new lines in existing documents should be replaced by some other
character before it could be trained.
Is there a work around ?


Kind Regards,
Ajith

OpenNLP Document Categorizer training format

Reply via email to