Re: Proccesing Bamun characters

2016-12-12 Thread Marshall Schor
another question: I assume there are perhaps 2 machines involved, here (it's a UIMA-AS setup). >From the exception, it appears that the error happen when the client sends the CAS to the remote. Can you print out the Linux (assuming that's the OS) default locale for both machines? (e.g. type in

Re: Proccesing Bamun characters

2016-12-12 Thread nelson rivera
Yes these are the values of the troublesome characters, using Integer.toHexString() to print out each byte, shows fff0 ff96 ffa6 ff80 fff0 ff96 ffa6 ff90 ffef ffbf ffbd ffef ffbf ffbd 2016-12-12 11:35 GMT-05:00, Marshall Schor : > Hi Nels

Re: Proccesing Bamun characters

2016-12-12 Thread Marshall Schor
Hi Nelson, Looking into this... Can you please confirm that the UTF-8 coding of the troublesome characters, in hexadecimal, is: F0 96 A6 80 F0 96 A6 90 EF BF BD EF BF BD If you have the string in Java, please try converting it to a UTF-8 string using something like: byte[] theBytes = myTest