Believe me, wchar_t is evil. Redefine your own 16 bits string or use a string<unsigned short> instead, which you can convert to/from XMLCh* easily. Even if theoretically unsigned short is not necessarily two bytes long, it is far more reliable than wchar_t being 2 bytes long.
Best -----Message d'origine----- De : David N Bertoni/Cambridge/IBM [mailto:[EMAIL PROTECTED] Envoyé : mardi 4 mars 2003 20:27 À : [EMAIL PROTECTED] Objet : RE: wchar_t and XMLCh > Thanks for your suggestion. That will probably work in every case except this one. > The reason being we are building a wrapper library over Xerces and our interface > exposes only std::wstring. We don't expose internal xerces types. In particular > on Solaris, we want to link against STLport library. Is there any requirement in > Xerces that will force XMLCh to be 2 bytes? If all xerces code uses sizeof(XMLCh) > then it should be probably be ok, but if there is any hard coded value (which > assumes 2 bytes), then the change won't work. I suggest you typedef something which mirrors the Xerces XMLCh typedef and use std::basic_string<OutXMLChTypedef>. Otherwise, you risk some incompatibility with Xerces now, or in the future. You also inadvertantly encourage the use of wide-character functions may not be prepared to accept UTF-16 code points: // find the first newline character in the Xerces string. const wchar_t* const newlineChar = wcschr(xercesStr.c_str(), 10); Will this work? Maybe, but who knows? A particular compiler/platform has a particular encoding for wchar_t and you should not attempt to force improperly-encoded code points into it. Of course, you can always change the Xerces typedef to wchar_t and do what you want, but that means you're on your own if there's a problem now or in the future. You also have to build a custom version of Xerces for every platform and be prepared to support it. It seems just a bit too scary for me. Dave qchen <[EMAIL PROTECTED] To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> m> cc: (bcc: David N Bertoni/Cambridge/IBM) Subject: RE: wchar_t and XMLCh 03/04/2003 09:41 AM Please respond to xerces-c-dev David, Thanks for your suggestion. That will probably work in every case except this one. The reason being we are building a wrapper library over Xerces and our interface exposes only std::wstring. We don't expose internal xerces types. In particular on Solaris, we want to link against STLport library. Is there any requirement in Xerces that will force XMLCh to be 2 bytes? If all xerces code uses sizeof(XMLCh) then it should be probably be ok, but if there is any hard coded value (which assumes 2 bytes), then the change won't work. Qi Chen -----Original Message----- From: David N Bertoni/Cambridge/IBM [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 04, 2003 10:08 AM To: [EMAIL PROTECTED] Subject: Re: wchar_t and XMLCh > Basically I need to convert the XMLCh* to a std::wstring and vice versa. In Xerces, XMLCh is > typdef-ed to unsigned short (2 bytes). Under win32, there is no need for conversion since wchar_t > is also typedef-ed to unsigned short. In Solaris/Linux/VMS, however, wchar_t is typedef-ed to > unsigned long (4 bytes), so the conversion seem to be inevitable. There are several reason there's no need for conversion on Win32. One is that Visual C++ 6.0 doesn't not implement wchar_t as a proper type, which is not correct. Most of the platforms to which you refer, depending on the age of the compiler, _do_ implement wchar_t as a proper type, and not as a typedef. The other, and more important reason, is because Win32 uses Unicode, so wide characters are known to be UCS-2/UTF-16 code points. > My question is: Does Xerces implementation requires that the size XMLCh to be 2 bytes? if I > change the typedef of XMLCh to wchar_t and recompile the xerces, would it work? I know the > answer is probably no, but I just want to make sure. Of course the memory usage will be doubled > if we change the XMLCh to 4 bytes, but that is not a concern for me. For any given operating system, the issue is not really the size of XMLCh, it's whether the operating system assumes wide characters are UCS-2/UTF-16 code points. If not, there's no point in making XMLCh and wchar_t compatible, because the OS cannot process them. You should re-examine why you're storing UTF-16 encoded character, like Xerces produces, in std::wstring. std::basic_string<XMLCh> might be a better choice. Dave --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.454 / Virus Database: 253 - Release Date: 10/02/2003 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.454 / Virus Database: 253 - Release Date: 10/02/2003 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
