Plamen Kirov wrote:
Hi,

We have application for parsing XMLs working fine when digits or English letters. When input XML file contains Arabic letters, the TranscodeToLocalCodePage method results an empty string.

Input XML file is UTF-8 encoded. OS: HP-UX B.11.23 ia64, LANG=ar_SA.utf8, NLS_LANG=AMERICAN_AMERICA.AR8ISO8859P6.

The source code is:
…..

static string transStringXalan2Std(const XalanDOMString& xalanString){
                string result;

                if (xalanString.empty() == true){
                        result = "";
                }
                else{
                        //conversion from UTF-16 to UTF-8
                        CharVectorType theString;
TranscodeToLocalCodePage(xalanString, theString, true);

// the result from TranscodeToLocalCodePage – theString, is zero sized when xalanString contains Arabic letters
I don't know enough about the HP-UX environment to understand the difference between LANG and NLS_LANG, but I suspect iconv is not assuming UTF-8 for the local code page. A couple of questions:

1. What versions of Xerces-C and Xalan-C are you using?

2. If you execute "locale -a," do you see ar_SA.utf8 as one of the installed locales?

Invocating method:

...


Thanks in advance for any help!
You should avoid using TranscodeToLocalCodePage, since the results are not predictable or portable with characters outside the ASCII character set. If you want to transcode to UTF-8, use the Xerces-C XMLTranscodingService to create a UTF-8 transcoder, and use that instead.

In the future, please post messages like this to the Xalan-C User list, not the Developer list.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to