> Thanks for your suggestion. That will probably work in every case except
this one.
> The reason being we are building a wrapper library over Xerces and our
interface
> exposes only std::wstring. We don't expose internal xerces types. In
particular
> on Solaris, we want to link against STLport library. Is there any
requirement in
> Xerces that will force XMLCh to be 2 bytes? If all xerces code uses
sizeof(XMLCh)
> then it should be probably be ok, but if there is any hard coded value
(which
> assumes 2 bytes), then the change won't work.

I suggest you typedef something which mirrors the Xerces XMLCh typedef and
use std::basic_string<OutXMLChTypedef>.  Otherwise, you risk some
incompatibility with Xerces now, or in the future.  You also inadvertantly
encourage the use of wide-character functions may not be prepared to accept
UTF-16 code points:

   // find the first newline character in the Xerces string.
   const wchar_t* const    newlineChar = wcschr(xercesStr.c_str(), 10);

Will this work?  Maybe, but who knows?  A particular compiler/platform has
a particular encoding for wchar_t and you should not attempt to force
improperly-encoded code points into it.

Of course, you can always change the Xerces typedef to wchar_t and do what
you want, but that means you're on your own if there's a problem now or in
the future.  You also have to build a custom version of Xerces for every
platform and be prepared to support it.  It seems just a bit too scary for
me.

Dave



                                                                                       
                                                        
                      qchen                                                            
                                                        
                      <[EMAIL PROTECTED]         To:      "'[EMAIL PROTECTED]'" 
<[EMAIL PROTECTED]>                          
                      m>                       cc:      (bcc: David N 
Bertoni/Cambridge/IBM)                                                   
                                               Subject: RE: wchar_t and XMLCh          
                                                        
                      03/04/2003 09:41                                                 
                                                        
                      AM                                                               
                                                        
                      Please respond                                                   
                                                        
                      to xerces-c-dev                                                  
                                                        
                                                                                       
                                                        



David,

Thanks for your suggestion. That will probably work in every case except
this one. The reason being we are building a wrapper library over Xerces
and our interface exposes only std::wstring. We don't expose internal
xerces types. In particular on Solaris, we want to link against STLport
library. Is there any requirement in Xerces that will force XMLCh to be 2
bytes? If all xerces code uses sizeof(XMLCh) then it should be probably be
ok, but if there is any hard coded value (which assumes 2 bytes), then the
change won't work.


Qi Chen


-----Original Message-----
From: David N Bertoni/Cambridge/IBM [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 04, 2003 10:08 AM
To: [EMAIL PROTECTED]
Subject: Re: wchar_t and XMLCh






> Basically I need to convert the XMLCh* to a std::wstring and vice versa.
In Xerces, XMLCh is
> typdef-ed to unsigned short (2 bytes). Under win32, there is no need for
conversion since wchar_t
> is also typedef-ed to unsigned short. In Solaris/Linux/VMS, however,
wchar_t is typedef-ed to
> unsigned long (4 bytes), so the conversion seem to be inevitable.

There are several reason there's no need for conversion on Win32. One is
that Visual C++ 6.0 doesn't not implement wchar_t as a proper type, which
is not correct.  Most of the platforms to which you refer, depending on the
age of the compiler, _do_ implement wchar_t as a proper type, and not as a
typedef.  The other, and more important reason, is because Win32 uses
Unicode, so wide characters are known to be UCS-2/UTF-16 code points.

> My question is: Does Xerces implementation requires that the size XMLCh
to be 2 bytes?  if I
> change the typedef of XMLCh to wchar_t and recompile the xerces, would it
work? I know the
> answer is probably no, but I just want to make sure. Of course the memory
usage will be doubled
> if we change the XMLCh to 4 bytes, but that is not a concern for me.

For any given operating system, the issue is not really the size of XMLCh,
it's whether the operating system assumes wide characters are UCS-2/UTF-16
code points.  If not, there's no point in making XMLCh and wchar_t
compatible, because the OS cannot process them.

You should re-examine why you're storing UTF-16 encoded character, like
Xerces produces, in std::wstring.  std::basic_string<XMLCh> might be a
better choice.

Dave


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to