The xerces parser does not parse the 3 bytes of utf8 characters. It is supposed to return the 3 characters but returns 1 byte of uninitialized memory (0xcd).
From: McCullough, Ryan [mailto:rmccullo...@rightnow.com] Sent: Tuesday, February 03, 2009 6:48 PM To: Apache AXIS C User List Cc: Antonczyk, Ryszard Subject: Axis with UTF-8 We are using Axis1 checked out from subversion along with Xerces-C Version 2.2.0. We are having trouble using Axis to retrieve UTF-8 characters. Is there any additional setup needed? Here is where we think things are going arye. axis\xml\xerces\XMLParserXerces.cpp::parse(bool ignoreWhitespace, bool peekIt) About line 125 there is this: // parse next token m_bCanParseMore = m_pParser->parseNext(m_ScanToken); It looks like the parseNext() function is converting 3 bytes of Unicode characters to 1 byte. Here is the hex data being returned from our web service: 00000808h: EF A4 85 ; ï¤... I have also attached the xml that was returned from the web service (xmlout14670.txt, this is logged on the server). Ryan McCullough | RightNow Technologies | Integration Tools Engineer 406-556-3162 office | Bozeman, MT | rmccullo...@rightnow.com<mailto:rmccullo...@rightnow.com> | http://www.rightnow.com<http://www.rightnow.com/>