In the OpenNLP documentation for 1.5.3, for Document Categorizer, it states
and I quote

"The Document Categorizer can be trained on annotated training material.
The data can be in OpenNLP Document Categorizer training format. This is
one document per line, containing category and text separated by a
whitespace. *Other formats can also be available.*"

In what other formats training file can be prepared ?
Where can I find that information ?
Also, in the format explained it states one Document per line. That means
all new lines in existing documents should be replaced by some other
character before it could be trained.
Is there a work around ?


Kind Regards,
Ajith

Reply via email to