Ahmed Abdeen(Home) wrote:
Hello UIMA Developers,I am getting the following error when I run the UIMA
Document Analyzer.
However, If I use the interactive mode it works fine. I can't specify what
is the source file of this issue. I would appreciate any help.
Thanks,
Ahmed
Please see http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues

It appears that some String data which is being serialized has invalid character codes in it (from an XML viewpoint) - namely a x'00'. There are several things you can do. 1) don't serialize this in the cas consumer. You may not realize this, but the Document Analyzer puts a cas consumer following your annotator, for the purpose of serializing out the processed data. If you write your own UIMA application, and don't use the Document Analyzer tool, you can choose what to write out and whether or not to use the XML serialization.

2) If it is acceptable in your application, you can change the invalid data to some alternate valid data. This change would of course depend on your application requirements.

Does this answer your question?

-Marshall

Reply via email to