wchar_t is messy. For the platforms you mentioned, if sizeof(wchar_t) ==
2 wchar_t will be utf-16. If the size is 4 bytes and __STDC_ISO_10646__
is defined, wchar_t is UCS4. I think. But this definitely does not cover
all possible platforms.
If you know that your Unicode data has no code points > 64k, you can do a
quick and dirty conversion to UCS4 by just unpacking the 16 bit values
into 32 bits, with the hi bytes being zero.
You'd think that there would be simple to use library functions for
converting to/from wchar_t, but there don't seem to be. I'm lobbying to
get one added to ICU.
Andy Heninger
IBM, Cupertino, CA
[EMAIL PROTECTED]
----- Original Message -----
From: "Mark A Russell" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, May 01, 2001 12:00 PM
Subject: RE: XMLCh & wchar_t conversion on multiple platforms
> So am I correct then in assuming that I will need to instantiate a
> transcoder of type ICU or Iconv just to do the conversion? If this is
the
> case then what are the encodingName 's that the constructors take, the
> UConverter that ICU takes, and the block size that Iconv takes?
>
> Is there some sample code out there that gives a simple case of how this
> works?
> Also how do you go about determining wchar_t format? (Beyond just using
> #ifdef's )
>
> Thanks,
>
> Mark R
>
> -----Original Message-----
> From: Andy Heninger [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, May 01, 2001 10:31 AM
> To: [EMAIL PROTECTED]
> Subject: Re: XMLCh & wchar_t conversion on multiple platforms
>
>
> wchar_t seems to be perpetually awkward, largely because its definition
> varies so much from platform to platform. You will end up with some
> platform specific code to find the local wchar_t format. Once you have
> that you can use either iconv (UNIXes), ICU converters (all platforms,
> assuming you have ICU around), or nothing (when wchar_t encoding is
> utf-16) to get from utf-16 encoded XMLCh strings to wchar_t strings.
>
>
> Andy Heninger
> IBM, Cupertino, CA
> [EMAIL PROTECTED]
>
> ----- Original Message -----
> From: "Mark A Russell" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Tuesday, May 01, 2001 6:50 AM
> Subject: RE: XMLCh & wchar_t conversion on multiple platforms
>
>
> > That seems to be the issue I'm running into, but I can't seem to
figure
> out
> > how to do the transcoding. I've looked through the docs, and more
> > importantly the headers and the closest thing I can find is the
> transcodeTo
> > and transcodeFrom functions. The issue I have with those is that you
> have
> > to determine which Transcoder to use, ie Iconv or ICU, you have to
know
> the
> > unicode type when you instantiate the transcoder, and also they are
not
> > static functions. Meaning I have to instantiate a transcoder just to
do
> > some conversions.
> >
> > Surely there is a simpler way to do the transcoding?
> >
> > Mark A Russell
> > NextGen Software Engineer
> > CSG Systems, Inc.
> > E-Mail: [EMAIL PROTECTED]
> >
> >
> > -----Original Message-----
> > From: Dean Roddey [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, April 30, 2001 4:44 PM
> > To: '[EMAIL PROTECTED]'
> > Subject: RE: XMLCh & wchar_t conversion on multiple platforms
> >
> >
> > A decision was made a while back, which I didn't really agree with, to
> fix
> > XMLCh to UTF-16 on all platforms. Partly this was because the DOM
> committee
> > chose UTF-16 for its representation. So, if this is not compatible
with
> your
> > wchar_t, you must transcode all of the data to your local wide string
> > representation before using it. On NT, the stuff spit out from the
> parser is
> > directly useable, since UTF-16 is NT's native representation of
Unicode.
> On
> > other platforms, you'll have to transcode if they don't do the same.
> >
> > --------------
> > Dean Roddey
> > Software Geek Extraordinaire
> > Portal, Inc
> > [EMAIL PROTECTED]
> >
> >
> >
> > -----Original Message-----
> > From: Mark A Russell [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, April 30, 2001 3:25 PM
> > To: [EMAIL PROTECTED]
> > Subject: XMLCh & wchar_t conversion on multiple platforms
> >
> >
> > Is there a way to convert between XMLCh and wchar_t on both the AIX
4.3
> &
> > Solaris platform that won't break my code on NT?
> >
> > I have some code that I'm trying to port from win32 that uses wchar_t
> for
> > unicode support. This code currently makes use of some of the xerces
> > functions that only take XMLCh 's. An example is shown below:
> >
> > const wchar_t * szSourceBinding =
> > attributes.getValue(CBOITagFactory::ATTR_SOURCE_BINDING);
> >
> > The CBOITagFactory::ATTR_SOURCE_BINDING is simply a wchar_t. (XMLCh's
> are
> > currently unsigned shorts)
> >
> > My requirement is to maintain unicode support on all three platforms.
I
> > thought about just redefining XMLCh's to wchar_t's like they used to
be
> > around 1.2, however after looking at the documentation that seems like
a
> > very bad idea because of an incompatibility that would arise on the
> Solaris
> > platform.
> >
> > Any help would be much appreciated.
> >
> > btw - What happen to the mailing list archives? They seem to be
> unreachable.
> >
> > Mark A Russell
> > NextGen Software Engineer
> > CSG Systems, Inc.
> > E-Mail: [EMAIL PROTECTED]
> >
> >
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]