On 09/21/2011 07:47 PM, Jiri Krutil wrote:
In my opinion it is not so obvious, because as far as I know:
- AMQP allows UTF-8 or UTF-16 strings.
- Many C++ applications supporting Unicode store strings in std::wstring
with UCS-2 encoding. Having fixed character size of 2 bytes per code point
allows for simple and efficient string manipulations. If required,
conversions to/from UTF-8 are performed on interfaces to the outside world.
(BTW I think this is also the case for Java.)
- In C++ it is fairly common to use std::string as a container for binary
data. I would not say it is wrong to do that.

I agree with all your points here.

I personally would say that in C++ there is no "default" character encoding.
Defaulting to UTF-8 makes some sense because all 7-bit ASCII strings are
UTF-8. But it may be dangerous to assume UTF-8 for all strings and it would
be probably be safer to somehow force the C++ programs to explicitly specify
the encoding when reading and writing strings.

Again I agree in general, but what about making assumptions in specific contexts? E.g. in Message::setProperty(), what if we documented that passing in a std::string as the second parameter is only valid if it contains utf8 encoded character data? Any other encoding would then need to be more explicit.

The most likely source of error here is where the data is binary (e.g. a digest or signature for the message), or where it is extended ASCII. If it is some other unicode encoding (e.g. utf16) then I think it would be reasonable to expect that to be explicitly noted.

In Java, the default encoding is apparently UTF-8, but the Java client
should still be able to accept strings encoded in UTF-16.

I think that the Qpid client libraries should support implicit conversions
between UTF-8 and UTF-16/UCS-2. I believe it is acceptable to support only
the UCS-2 character set (the Unicode's Basic Multilingual Plane) in C++
client.

So add in support for wstring and convert as necessary? I think that would be a good thing to do regardless. As you say, where unicode is used in earnest, wstring is the more obvious choice.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to