Michael \(michka\) Kaplan writes:
> > To find character n I have to walk all of the 16-bit values in that
> > string accounting for surrogates. If I use UTF-32 I don't need to do
> > that. This very issue came up during the discussion of how to handle
> > surrogates in Python.
>
> Would this not
From: "Tom Emerson" <[EMAIL PROTECTED]>
> But if I have a text string, and that string is encoded in UTF-16, and
> I want to access Unicode character values, then I cannot index that
> string in constant time.
>
> To find character n I have to walk all of the 16-bit values in that
> string accoun
Carl W. Brown writes:
> If you implement an array that is directly indexed by Unicode code point it
> would have to have 1114111 entries. (I love the number) I don't think that
> many applications can afford to have over a megabyte of storage per byte of
> table width. If nothing else it would
Tom,
> Andy Heninger writes:
> > Performance tuning is easier with UTF-16. You can optimize for
> > BMP characters, knowing that surrogate pairs are sufficiently uncommon
> > that it's OK for them take a bail-out slow path.
>
> Sure, but if you are using UTF-16 (or any other multibyte encoding)
Andy Heninger writes:
> Performance tuning is easier with UTF-16. You can optimize for
> BMP characters, knowing that surrogate pairs are sufficiently uncommon
> that it's OK for them take a bail-out slow path.
Sure, but if you are using UTF-16 (or any other multibyte encoding)
you loose the a
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
> Why would UTF-16 be easier for internal processing than UTF-8?
> Both are variable-length encodings.
>
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
t
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
> Why would UTF-16 be easier for internal processing than UTF-8?
> Both are variable-length encodings.
>
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
t
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
> Why would UTF-16 be easier for internal processing than UTF-8?
> Both are variable-length encodings.
>
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
t
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
> Why would UTF-16 be easier for internal processing than UTF-8?
> Both are variable-length encodings.
>
Performance tuning is easier with UTF-16. You can optimize for
BMP characters, knowing that surrogate pairs are sufficiently uncommon
t
From: "Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]>
> Why would UTF-16 be easier for internal processing than UTF-8?
> Both are variable-length encodings.
Good straw man!
Working with UTF-16 is immensely easier than working with UTF-8. As I am am
sure you know! :-)
MichKa
Michael Kaplan
Tr
Thu, 20 Sep 2001 12:46:49 -0700 (PDT), Kenneth Whistler <[EMAIL PROTECTED]> pisze:
> If you are expecting better performance from a library that takes UTF-8
> API's and then does all its internal processing in UTF-8 *without*
> converting to UTF-16, then I think you are mistaken. UTF-8 is a bad
>
> > UTF-16 <-> wchar_t*
>
> Wait be careful here. wchar_t is not an encoding. So.. in
> theory, you cannot convert between UTF-16 and wchar_t. You,
> however, can convert between UTF-16 and wchar_t* ON win32
> since microsoft declare UTF-16 as the encoding for wchar_t.
And he can also
On Fri, Sep 21, 2001 at 04:16:50PM -0700, Yung-Fong Tang wrote:
> Then... use Unicode internally in your software regardless you use
> UTF-8 or UCS2 as the data type in the interface, eventually some code
> need to convert it to UCS2 for most of the processing.
Why? UCS2 shouldn't be used at
Yung-Fong Tang wrote:
> > UTF-16 <-> wchar_t*
>
> Wait be careful here. wchar_t is not an encoding. So.. in theory, you cannot
>convert between UTF-16 and wchar_t. You,
> however, can convert between UTF-16 and wchar_t* ON win32 since microsoft declare
>UTF-16 as the encoding for wchar_
Markus Scherer wrote:
> I would like to add that ICU 2.0 (in a few weeks) will have convenience functions
>for in-process string transformations:
>
> UTF-16 <-> UTF-8
> UTF-16 <-> UTF-32
> UTF-16 <-> wchar_t*
Wait be careful here. wchar_t is not an encoding. So.. in theory, yo
Mozilla also use Unicode internally and are cross platform.
[EMAIL PROTECTED] wrote:
For cross-platform software (NT,Solaris,HP,AIX),
the only 3rd-party unicode support
I found so far is IBM ICU.
It's a very good support for
cross-platform software internationalization. However,
ICU internally
I would like to add that ICU 2.0 (in a few weeks) will have convenience functions for
in-process string transformations:
UTF-16 <-> UTF-8
UTF-16 <-> UTF-32
UTF-16 <-> wchar_t*
markus
Ken
> > I have to convert from UTF-8 to UTF-16, before calling ICU
> functions (such
> > as ucol_strcoll() )
> >
> > I'm worried about the performance overhead of this conversion.
>
> You shouldn't be.
>
> The conversion from UTF-8 to UTF-16 and back is algorithmic and very
> fast.
To make this
On Thu, Sep 20, 2001 at 02:02:37PM -0400, [EMAIL PROTECTED] wrote:
> I'm worried about the performance overhead of this conversion.
How much is this performance overhead? Converting UTF-8 to UTF-16 is
computationally trivial; my guess is that it would be significant for
cat or grep (maybe . . . t
Changjian Sun said:
> For cross-platform software (NT,Solaris,HP,AIX), the only 3rd-party
> unicode support
> I found so far is IBM ICU.
> It's a very good support for cross-platform software internationalization.
> However,
> ICU internally uses UTF-16, For our application using UTF-8 as inp
For cross-platform software (NT,Solaris,HP,AIX), the only 3rd-party unicode support
I found so far is IBM ICU.
It's a very good support for cross-platform software internationalization. However,
ICU internally uses UTF-16, For our application using UTF-8 as input and output,
I have to convert fr
21 matches
Mail list logo