[ 
https://issues.apache.org/jira/browse/STANBOL-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021548#comment-13021548
 ] 

Olivier Grisel commented on STANBOL-176:
----------------------------------------

Sample exception we might get trying to serialize such a "corrupted" literal as 
RDF/XML:

com.hp.hpl.jena.shared.CannotEncodeCharacterException: cannot encode (char)  
in context XML
        at 
com.hp.hpl.jena.rdf.model.impl.Util.substituteEntitiesInElementContent(Util.java:188)
        at com.hp.hpl.jena.xmloutput.impl.Basic.writeLiteral(Basic.java:168)
        at com.hp.hpl.jena.xmloutput.impl.Basic.writePredicate(Basic.java:104)
        at 
com.hp.hpl.jena.xmloutput.impl.Basic.writeRDFStatements(Basic.java:77)
        at 
com.hp.hpl.jena.xmloutput.impl.Basic.writeRDFStatements(Basic.java:66)
        at com.hp.hpl.jena.xmloutput.impl.Basic.writeBody(Basic.java:40)
        at 
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.writeXMLBody(BaseXMLWriter.java:500)
        at 
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:472)
        at 
com.hp.hpl.jena.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:458)
        at 
org.apache.clerezza.rdf.jena.serializer.JenaSerializerProvider.serialize(JenaSerializerProvider.java:65)
        at 
org.apache.clerezza.rdf.core.serializedform.Serializer.serialize(Serializer.java:144)
        at 
org.apache.stanbol.enhancer.jersey.resource.ContentItemResource.getRdfMetadata(ContentItemResource.java:132)

> NER engine should not put control chars in text literals of the annotation 
> graph
> --------------------------------------------------------------------------------
>
>                 Key: STANBOL-176
>                 URL: https://issues.apache.org/jira/browse/STANBOL-176
>             Project: Stanbol
>          Issue Type: Bug
>            Reporter: Olivier Grisel
>            Assignee: Olivier Grisel
>
> Some text to analyse might contain control chars such as "\x13", "\x14", 
> "\x15"... Such characters cannothe be serialized as XML and are generally 
> worthless in the labels and context properties of enhancements.
> The NER engine should filter them out before writing its annotations to the 
> content item graph.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to