Plamen Kirov wrote:
Hi,
We have application for parsing XMLs working fine when digits or English
letters. When input XML file contains Arabic letters, the
TranscodeToLocalCodePage method results an empty string.
Input XML file is UTF-8 encoded. OS: HP-UX B.11.23 ia64,
LANG=ar_SA.utf8, NLS_LANG=AMERICAN_AMERICA.AR8ISO8859P6.
The source code is:
…..
static string transStringXalan2Std(const XalanDOMString& xalanString){
string result;
if (xalanString.empty() == true){
result = "";
}
else{
//conversion from UTF-16 to UTF-8
CharVectorType theString;
TranscodeToLocalCodePage(xalanString, theString,
true);
// the result from TranscodeToLocalCodePage – theString, is zero sized
when xalanString contains Arabic letters
I don't know enough about the HP-UX environment to understand the
difference between LANG and NLS_LANG, but I suspect iconv is not
assuming UTF-8 for the local code page. A couple of questions:
1. What versions of Xerces-C and Xalan-C are you using?
2. If you execute "locale -a," do you see ar_SA.utf8 as one of the
installed locales?
Invocating method:
...
Thanks in advance for any help!
You should avoid using TranscodeToLocalCodePage, since the results are
not predictable or portable with characters outside the ASCII character
set. If you want to transcode to UTF-8, use the Xerces-C
XMLTranscodingService to create a UTF-8 transcoder, and use that instead.
In the future, please post messages like this to the Xalan-C User list,
not the Developer list.
Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]