In the OpenNLP documentation for 1.5.3, for Document Categorizer, it states and I quote
"The Document Categorizer can be trained on annotated training material. The data can be in OpenNLP Document Categorizer training format. This is one document per line, containing category and text separated by a whitespace. *Other formats can also be available.*" In what other formats training file can be prepared ? Where can I find that information ? Also, in the format explained it states one Document per line. That means all new lines in existing documents should be replaced by some other character before it could be trained. Is there a work around ? Kind Regards, Ajith
