[ 
https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286520#comment-13286520
 ] 

Rupert Westenthaler commented on STANBOL-583:
---------------------------------------------

Hi Alessio,

while testing I found an other bug in your Server implementation. In revision 
1344669 I added an other unit test to the NER engine that nicely reproduces it.

The root cause is

    Caused by: org.xml.sax.SAXParseException: An invalid XML character 
(Unicode: 0x19) was found in the element content of the document.

while creating the 

    SOAPBody soapBody = message.getSOAPBody();

for the response data of the NER response (NERserviceClientHTTP). Based on a 
short google search, I assume that the server does not correctly escape special 
chars in the labels of detected entities. Most posts suggest that using 
"StringEscapeUtils.escapeXml(..)" solves this.

NOTE: This does not block this issue, as it does not affect the contributed 
Engine.
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, 
> STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an 
> annotation on the document whose content is the lemmatized form of the 
> document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, 
> Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, 
> Norwegian
> - a Document Classification services for Italian, French, German, English, 
> Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to