Hi Brian,
We end up doing what Xerces does, so we can consume each other's UTF-16
strings. Xerces is a bit funny in that they always use unsigned short,
except on some platforms where the port was done by someone else.
Currently, Borland C++ is the only platform where wchar_t is used by
Xerces.
The algorithm really ought to be:
If wchar_t is known to be UTF-16, then use it. Otherwise, use unsigned
short.
So, all WIN32/64 compilers should use wchar_t, as should AIX 32-bit. I
don't know about any of the other Unix platforms. Linux is an entirely
different story, as wchar_t can even be EBCDIC, depending on the platform.
The only reason that ifdef exists in Xalan is because I started to do a
Borland port (but gave up), and had to do that first.
We didn't ever figure the whole URIResolver thing, did we? It's probably
too late to do that for the next release, but we ought to revive that
discussion and settle on something.
Dave
Brian Quinlan
<brian@sweetapp. To: [EMAIL PROTECTED]
com> cc: (bcc: David N
Bertoni/Cambridge/IBM)
Subject: Semantics of string types
01/25/2003 01:45
PM
Please respond
to xalan-dev
I'd like to get my head around the various string types used in Xalan-C.
#ifdef XALAN_USE_NATIVE_WCHAR_T
XalanDOMChar is a wchar_t (is it interpreted as UTF-16 or as
UCS-2/UCS-4?)
#elsif
XalanDOMChar is a UTF-16 character
#endif
XMLCh is a wchar_t, how is that to be interpreted?
Cheers,
Brian
- xml.apache.org refactoring #1 Ted Leung
- Re: xml.apache.org refactoring #1 Dirk-Willem van Gulik
- xml.apache.org refactoring #1 - expansi... Ted Leung
- Re: xml.apache.org refactoring #1 -... Ted Leung
- Semantics of string types Brian Quinlan
- RE: Semantics of string typ... David N Bertoni/Cambridge/IBM
- RE: Semantics of strin... Brian Quinlan
- RE: Semantics of s... David N Bertoni/Cambridge/IBM
