[
https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286520#comment-13286520
]
Rupert Westenthaler commented on STANBOL-583:
---------------------------------------------
Hi Alessio,
while testing I found an other bug in your Server implementation. In revision
1344669 I added an other unit test to the NER engine that nicely reproduces it.
The root cause is
Caused by: org.xml.sax.SAXParseException: An invalid XML character
(Unicode: 0x19) was found in the element content of the document.
while creating the
SOAPBody soapBody = message.getSOAPBody();
for the response data of the NER response (NERserviceClientHTTP). Based on a
short google search, I assume that the server does not correctly escape special
chars in the labels of detected entities. Most posts suggest that using
"StringEscapeUtils.escapeXml(..)" solves this.
NOTE: This does not block this issue, as it does not affect the contributed
Engine.
> CELI enhancement engine(s) - Contribution to stanbol
> -----------------------------------------------------
>
> Key: STANBOL-583
> URL: https://issues.apache.org/jira/browse/STANBOL-583
> Project: Stanbol
> Issue Type: New Feature
> Components: Enhancer
> Affects Versions: 0.9.0-incubating
> Environment: Enhancement Engines developed as web service clients
> Reporter: Alessio Bosca
> Assignee: Rupert Westenthaler
> Priority: Minor
> Labels: patch
> Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch,
> STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish (it creates an
> annotation on the document whose content is the lemmatized form of the
> document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese,
> Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek,
> Norwegian
> - a Document Classification services for Italian, French, German, English,
> Spanish, Portuguese that associates a document to DBPedia classes
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira