From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED]
Why would UTF-16 be easier for internal processing than UTF-8?
Both are variable-length encodings.
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
that
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED]
Why would UTF-16 be easier for internal processing than UTF-8?
Both are variable-length encodings.
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
that
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED]
Why would UTF-16 be easier for internal processing than UTF-8?
Both are variable-length encodings.
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
that
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED]
Why would UTF-16 be easier for internal processing than UTF-8?
Both are variable-length encodings.
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
that
Andy Heninger writes:
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
that it's OK for them take a bail-out slow path.
Sure, but if you are using UTF-16 (or any other multibyte encoding)
you loose the
Tom,
Andy Heninger writes:
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
that it's OK for them take a bail-out slow path.
Sure, but if you are using UTF-16 (or any other multibyte encoding)
you
Carl W. Brown writes:
If you implement an array that is directly indexed by Unicode code point it
would have to have 1114111 entries. (I love the number) I don't think that
many applications can afford to have over a megabyte of storage per byte of
table width. If nothing else it would be
From: Tom Emerson [EMAIL PROTECTED]
But if I have a text string, and that string is encoded in UTF-16, and
I want to access Unicode character values, then I cannot index that
string in constant time.
To find character n I have to walk all of the 16-bit values in that
string accounting for
Michael \(michka\) Kaplan writes:
To find character n I have to walk all of the 16-bit values in that
string accounting for surrogates. If I use UTF-32 I don't need to do
that. This very issue came up during the discussion of how to handle
surrogates in Python.
Would this not be the
Thu, 20 Sep 2001 12:46:49 -0700 (PDT), Kenneth Whistler [EMAIL PROTECTED] pisze:
If you are expecting better performance from a library that takes UTF-8
API's and then does all its internal processing in UTF-8 *without*
converting to UTF-16, then I think you are mistaken. UTF-8 is a bad
form
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED]
Why would UTF-16 be easier for internal processing than UTF-8?
Both are variable-length encodings.
Good straw man!
Working with UTF-16 is immensely easier than working with UTF-8. As I am am
sure you know! :-)
MichKa
Michael Kaplan
I would like to add that ICU 2.0 (in a few weeks) will have convenience functions for
in-process string transformations:
UTF-16 - UTF-8
UTF-16 - UTF-32
UTF-16 - wchar_t*
markus
Mozilla also use Unicode internally and are cross platform.
[EMAIL PROTECTED] wrote:
For cross-platform software (NT,Solaris,HP,AIX),
the only 3rd-party unicode support
I found so far is IBM ICU.
It's a very good support for
cross-platform software internationalization. However,
ICU internally
Markus Scherer wrote:
I would like to add that ICU 2.0 (in a few weeks) will have convenience functions
for in-process string transformations:
UTF-16 - UTF-8
UTF-16 - UTF-32
UTF-16 - wchar_t*
Wait be careful here. wchar_t is not an encoding. So.. in theory, you cannot
Yung-Fong Tang wrote:
UTF-16 - wchar_t*
Wait be careful here. wchar_t is not an encoding. So.. in theory, you cannot
convert between UTF-16 and wchar_t. You,
however, can convert between UTF-16 and wchar_t* ON win32 since microsoft declare
UTF-16 as the encoding for wchar_t.
On Fri, Sep 21, 2001 at 04:16:50PM -0700, Yung-Fong Tang wrote:
Then... use Unicode internally in your software regardless you use
UTF-8 or UCS2 as the data type in the interface, eventually some code
need to convert it to UCS2 for most of the processing.
Why? UCS2 shouldn't be used at
UTF-16 - wchar_t*
Wait be careful here. wchar_t is not an encoding. So.. in
theory, you cannot convert between UTF-16 and wchar_t. You,
however, can convert between UTF-16 and wchar_t* ON win32
since microsoft declare UTF-16 as the encoding for wchar_t.
And he can also do some
Changjian Sun said:
For cross-platform software (NT,Solaris,HP,AIX), the only 3rd-party
unicode support
I found so far is IBM ICU.
It's a very good support for cross-platform software internationalization.
However,
ICU internally uses UTF-16, For our application using UTF-8 as input
On Thu, Sep 20, 2001 at 02:02:37PM -0400, [EMAIL PROTECTED] wrote:
I'm worried about the performance overhead of this conversion.
How much is this performance overhead? Converting UTF-8 to UTF-16 is
computationally trivial; my guess is that it would be significant for
cat or grep (maybe . . .
Ken
I have to convert from UTF-8 to UTF-16, before calling ICU
functions (such
as ucol_strcoll() )
I'm worried about the performance overhead of this conversion.
You shouldn't be.
The conversion from UTF-8 to UTF-16 and back is algorithmic and very
fast.
To make this conversion
20 matches
Mail list logo