another question:  I assume there are perhaps 2 machines involved, here (it's a
UIMA-AS setup). 

>From the exception, it appears that the error happen when the client sends the
CAS to the remote.

Can you print out the Linux (assuming that's the OS) default locale for both
machines?  (e.g. type into a command line "locale" and see what each machines
has as its default character encoding).

Please let us know what these are.

Thanks. -Marshall



On 12/12/2016 1:58 PM, nelson rivera wrote:
> Yes these are the values of the troublesome characters, using
> Integer.toHexString() to print out each byte, shows
>
> fffffff0 ffffff96 ffffffa6 ffffff80
>
> fffffff0 ffffff96 ffffffa6 ffffff90
>
> ffffffef ffffffbf ffffffbd
>
> ffffffef ffffffbf ffffffbd
>
> 2016-12-12 11:35 GMT-05:00, Marshall Schor <m...@schor.com>:
>> Hi Nelson,
>>
>> Looking into this... Can you please confirm that the UTF-8 coding of the
>> troublesome characters, in hexadecimal, is:
>>
>> F0 96 A6 80
>>
>> F0 96 A6 90
>>
>> EF BF BD
>>
>> EF BF BD
>>
>> If you have the string in Java, please try converting it to a UTF-8 string
>> using
>> something like:
>>   byte[] theBytes = myTestString.getBytes("UTF-8");
>>
>>   and then print out theBytes in hex; they should look like the above.  If
>> not,
>> please let us know what the values is instead.
>>
>>
>> Thanks. -Marshall
>>
>>
>> On 12/9/2016 9:02 AM, nelson rivera wrote:
>>> Hi i was read your explication and saw the link, but in my case, i
>>> don't read any xml file. Just i copy the text, get a new input cas
>>> from UimaAsynchronousEngine with getCAS(), set the text in the cas and
>>> send the request whit sendCAS(). I use uima-as API 2.9.0 in the client
>>> side. Apparently the characters are changed for its entities
>>> corresponding when serialize the cas to send it, but i get the
>>> mentioned exception "org.xml.sax.SAXParseException; lineNumber: 1;
>>> columnNumber: 571; Character reference "&#"
>>> in uima-as framework installed when trying to deserialize the cas
>>> deserializeCasFromXmi(),to be processed for the service.
>>>
>>> 2016-12-08 16:48 GMT-05:00, Marshall Schor <m...@schor.com>:
>>>> Hi Nelson,
>>>>
>>>> I can't see the characters (sorry).
>>>>
>>>> This might be an issue caused by a discrepancy between the coding of the
>>>> file
>>>> being read, and the coding indicated on the xml header.  Can you check
>>>> that
>>>> those two things are the same?
>>>>
>>>> See
>>>> http://stackoverflow.com/questions/5165347/what-use-is-the-encoding-in-the-xml-header
>>>> for example.
>>>>
>>>> -Marshall
>>>>
>>>> On 12/8/2016 4:20 PM, nelson rivera wrote:
>>>>> i tried to proccess the following text in a service deploy in uima-as,
>>>>> because is input of my application. This is the text : 𖦀  𖦐  �  �.
>>>>> These characters correspond to the bamun language, and apparently are
>>>>> not  invalid xml characters because tools such as browsers interpret
>>>>> it and show it. After get a new input cas to proccesing, set the text
>>>>> and send the request, i get  the exception that i show below in
>>>>> uima-as, the framework uima-as work and recovers correctly, just not
>>>>> process this characters.
>>>>> Could you tell me what happens with these characters, one of these is
>>>>> invalid characters for framework uima-as?
>>>>>
>>>>>
>>>>>
>>>>> 04:00:31.606 - 14:
>>>>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteClient:
>>>>> WARNING:
>>>>> org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 571;
>>>>> Character reference "&#
>>>>>         at
>>>>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
>>>>>         at
>>>>> org.apache.uima.aae.UimaSerializer.deserializeCasFromXmi(UimaSerializer.java:187)
>>>>>         at
>>>>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.deserializeCASandRegisterWithCache(ProcessRequestHandler_impl.java:222)
>>>>>         at
>>>>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteClient(ProcessRequestHandler_impl.java:552)
>>>>>         at
>>>>> org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:1090)
>>>>>         at
>>>>> org.apache.uima.aae.handler.input.MetadataRequestHandler_impl.handle(MetadataRequestHandler_impl.java:78)
>>>>>         at
>>>>> org.apache.uima.adapter.jms.activemq.JmsInputChannel.onMessage(JmsInputChannel.java:731)
>>>>>
>>

Reply via email to