Ahmed Abdeen(Home) wrote:
Hello UIMA Developers,I am getting the following error when I run the UIMA
Document Analyzer.
However, If I use the interactive mode it works fine. I can't specify what
is the source file of this issue. I would appreciate any help.
Thanks,
Ahmed
Please see
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.xmi_emf.xml_character_issues
It appears that some String data which is being serialized has invalid
character codes in it (from an XML viewpoint) - namely a x'00'.
There are several things you can do.
1) don't serialize this in the cas consumer. You may not realize this,
but the Document Analyzer puts a cas consumer following your annotator,
for the purpose of serializing out the processed data. If you write
your own UIMA application, and don't use the Document Analyzer tool, you
can choose what to write out and whether or not to use the XML
serialization.
2) If it is acceptable in your application, you can change the invalid
data to some alternate valid data. This change would of course depend
on your application requirements.
Does this answer your question?
-Marshall