[
https://issues.apache.org/jira/browse/STANBOL-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021548#comment-13021548
]
Olivier Grisel commented on STANBOL-176:
----------------------------------------
Sample exception we might get trying to serialize such a "corrupted" literal as
RDF/XML:
com.hp.hpl.jena.shared.CannotEncodeCharacterException: cannot encode (char)
in context XML
at
com.hp.hpl.jena.rdf.model.impl.Util.substituteEntitiesInElementContent(Util.java:188)
at com.hp.hpl.jena.xmloutput.impl.Basic.writeLiteral(Basic.java:168)
at com.hp.hpl.jena.xmloutput.impl.Basic.writePredicate(Basic.java:104)
at
com.hp.hpl.jena.xmloutput.impl.Basic.writeRDFStatements(Basic.java:77)
at
com.hp.hpl.jena.xmloutput.impl.Basic.writeRDFStatements(Basic.java:66)
at com.hp.hpl.jena.xmloutput.impl.Basic.writeBody(Basic.java:40)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.writeXMLBody(BaseXMLWriter.java:500)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:472)
at
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:458)
at
org.apache.clerezza.rdf.jena.serializer.JenaSerializerProvider.serialize(JenaSerializerProvider.java:65)
at
org.apache.clerezza.rdf.core.serializedform.Serializer.serialize(Serializer.java:144)
at
org.apache.stanbol.enhancer.jersey.resource.ContentItemResource.getRdfMetadata(ContentItemResource.java:132)
> NER engine should not put control chars in text literals of the annotation
> graph
> --------------------------------------------------------------------------------
>
> Key: STANBOL-176
> URL: https://issues.apache.org/jira/browse/STANBOL-176
> Project: Stanbol
> Issue Type: Bug
> Reporter: Olivier Grisel
> Assignee: Olivier Grisel
>
> Some text to analyse might contain control chars such as "\x13", "\x14",
> "\x15"... Such characters cannothe be serialized as XML and are generally
> worthless in the labels and context properties of enhancements.
> The NER engine should filter them out before writing its annotations to the
> content item graph.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira