are you sure they are using UCS2 and not some 16bit codepages? That
exists also ;)
Not really.
I checked the unicodes 0x0100 and 0x0101 (capital and lower case "A"
with a dash). Same can correctly be viewed in the debugger when pointing
to the WideString variable. So I guess it indeed is un
Michael Schnell schrieb:
>
>> The encoding can be important for speed:
>> For example the widestring xml parser is up to 10 times slower than
>> the ansistring xml parser.
>>
> That obviously is the reason why Turbo - Delphi uses UCS-2 (16 bit)
> instead of OF UTF-8 or UTF-16 for WideStrings (an
That obviously is the reason why Turbo - Delphi uses UCS-2 (16 bit)
instead of OF UTF-8 or UTF-16 for WideStrings (and WideChar is a 16
bit (UCS-2) value).
You didn't read http://www.jacobthurman.com/?p=30 , did you?
They are talking about Delphi 2009, of which I don't have any
infor
Am Montag, 29. September 2008 09:25 schrieb Michael Schnell:
> > The encoding can be important for speed:
> > For example the widestring xml parser is up to 10 times slower than
> > the ansistring xml parser.
>
> That obviously is the reason why Turbo - Delphi uses UCS-2 (16 bit)
> instead of OF UT
s[i]:='x' doesn't work in UTF-8, nor UTF-16, nor UTF-32.
It would work, but it would need an implementation that moves the tail
of the string around and thus would be really slow.
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
The encoding can be important for speed:
For example the widestring xml parser is up to 10 times slower than
the ansistring xml parser.
That obviously is the reason why Turbo - Delphi uses UCS-2 (16 bit)
instead of OF UTF-8 or UTF-16 for WideStrings (and WideChar is a 16 bit
(UCS-2) value).
I don't think, full UTF-16 really would be desirable desirable over UC-2.
Imagine you have a string of some million characters (e.g. a Book). All
functions that need to find the n-th character (like x[n], copy, ...)
would take forever, as they need to scan the complete string (if not
widest
On Sunday 28 September 2008 20.16:36 Graeme Geldenhuys wrote:
> On Sun, Sep 28, 2008 at 12:22 PM, Mattias Gaertner
> > Is this normalized form used only internally in msegui or must the user
> > use them too?
>
> I remember when I tried a MSEgui version some time back, that the IDE
> itself used t
On Sun, Sep 28, 2008 at 12:22 PM, Mattias Gaertner
<[EMAIL PROTECTED]> wrote:
>
> You can not normalize the composed and decomposed state platform
> independently. For example Linux ext3 does not normalize in any
> way and therefore distinguish between composed a-umlaut and decomposed
> a-umlaut. Y
On Sun, 28 Sep 2008 09:23:14 +0200
Martin Schreiber <[EMAIL PROTECTED]> wrote:
> On Sunday 28 September 2008 00.10:43 Graeme Geldenhuys wrote:
> > On Fri, Sep 26, 2008 at 5:02 PM, Mattias Gaertner
> >
> > <[EMAIL PROTECTED]> wrote:
> > > s[i]:='x' doesn't work in UTF-8, nor UTF-16, nor UTF-32.
> >
On Sunday 28 September 2008 00.10:43 Graeme Geldenhuys wrote:
> On Fri, Sep 26, 2008 at 5:02 PM, Mattias Gaertner
>
> <[EMAIL PROTECTED]> wrote:
> > s[i]:='x' doesn't work in UTF-8, nor UTF-16, nor UTF-32.
> >
> > In short:
> > A single character for all purposes can not be defined. Unicode can not
On Sat, Sep 27, 2008 at 2:35 PM, Luiz Americo Pereira Camara
<[EMAIL PROTECTED]> wrote:
>> Good question and I have been wondering about this myself. In D2009
>> SizeOf(Char) = 2, so I have no idea how that works with surrogate
>> pairs. Can anybody explain this please?
>>
>
> In http://www.jacobt
On Fri, Sep 26, 2008 at 5:02 PM, Mattias Gaertner
<[EMAIL PROTECTED]> wrote:
>
> s[i]:='x' doesn't work in UTF-8, nor UTF-16, nor UTF-32.
>
> In short:
> A single character for all purposes can not be defined. Unicode can not
> be handled as array of character.
This is what I thought, but everybod
Graeme Geldenhuys wrote:
(AFAI understand, a Widechar is just 16 bit, it would need to
be 32 bit if surrogates were allowed in Widestrings).
Good question and I have been wondering about this myself. In D2009
SizeOf(Char) = 2, so I have no idea how that works with surrogate
pairs. Can any
On Fri, 26 Sep 2008 13:20:57 +0200
Michael Schnell <[EMAIL PROTECTED]> wrote:
> Nonetheless a type to hold a single character needs to exist. And
> same needs to be a 32 bit type if you want to store more than 2^16
> different values (as possible with UTF-8 and UTF-16 but not with
> UCS-2.
Some c
In our previous episode, Michael Schnell said:
> >> Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring the
> >> surrogates ?
> >
> > Lets hope not,
> I don't think, full UTF-16 really would be desirable desirable over UC-2.
>
> Imagine you have a string of some million chara
Graeme Geldenhuys wrote:
Has anybody else got sample test code that clearly shows the claimed
"significant speed gain" in using UTF-16 for Windows API's? If so,
could you please post the code and your comparative results (timing
values). I think most people perception was that ANSI API's will
Nonetheless a type to hold a single character needs to exist. And same
needs to be a 32 bit type if you want to store more than 2^16 different
values (as possible with UTF-8 and UTF-16 but not with UCS-2.
-Michael
___
fpc-devel maillist - fpc-devel@
In our previous episode, Dani?l Mantione said:
> > Taking a step back from Free Pascal and Tiburon How do other
> > frameworks handle string encodings etc... Frameworks like Java, Qt
> > etc... Can't we learn something from them as well? Both Java and Qt
> > run on multiple platforms, read/wri
In our previous episode, Michael Schnell said:
> >> need to be 32 bit if surrogates were allowed in Widestrings).
> >>
> How to squeeze a value > $ in a 16 Bit value ?
>
> Can you magically store two bits in a single hardware cell ?
As said before, unicode is more than just expanding the
How do other
frameworks handle string encodings etc
With .NET/Mono I suppose you don't need to bother. But I suppose this is
one of the reasons that strings are constants once they are assigned
some value; and you can't so things like s[n] := 'x'.
-Michael
___
On Friday 26 September 2008 12.30:27 Marco van de Voort wrote:
> In our previous episode, Martin Schreiber said:
> > Hmm, you should ask the Russian users for example if they prefer MSEgui
> > utf-16 internal encoding or Lazarus utf-8.
>
> Users always look short term, and want to change as little
Op Fri, 26 Sep 2008, schreef Graeme Geldenhuys:
Taking a step back from Free Pascal and Tiburon How do other
frameworks handle string encodings etc... Frameworks like Java, Qt
etc... Can't we learn something from them as well? Both Java and Qt
run on multiple platforms, read/write to fil
need to be 32 bit if surrogates were allowed in Widestrings).
How to squeeze a value > $ in a 16 Bit value ?
Can you magically store two bits in a single hardware cell ?
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring the
surrogates ?
Lets hope not,
I don't think, full UTF-16 really would be desirable desirable over UC-2.
Imagine you have a string of some million characters (e.g. a Book). All
functions that need to find the n-
On Fri, Sep 26, 2008 at 12:34 PM, Marco van de Voort <[EMAIL PROTECTED]> wrote:
>> I guess that would be one of the best solutions. Having a system unicode
>> string type and then some specialized string types.
>>
>> SysString
>> UTF8String
>> UTF16String
>> UTF32String
>> Anyway, I still think som
Martin Schreiber wrote:
Hmm, you should ask the Russian users for example if they prefer MSEgui utf-16
internal encoding or Lazarus utf-8.
You are mixing things a bit. People from russian forum prefere less
bugs. And utf8 implementation of lazarus brought them alot. This is the
difference.
In our previous episode, Ivo Steinmann said:
> > in the native encoding per platform.
> >
> >
> I guess that would be one of the best solutions. Having a system unicode
> string type and then some specialized string types.
>
> SysString
> UTF8String
> UTF16String
> UTF32String
> Anyway, I still
Hello Graeme,
Friday, September 26, 2008, 10:50:43 AM, you wrote:
GG> Good question and I have been wondering about this myself. In D2009
GG> SizeOf(Char) = 2, so I have no idea how that works with surrogate
GG> pairs. Can anybody explain this please?
I don't know how D2009 and others do it, bu
In our previous episode, Martin Schreiber said:
> >
> Hmm, you should ask the Russian users for example if they prefer MSEgui
> utf-16
> internal encoding or Lazarus utf-8.
Users always look short term, and want to change as little as possible.
This goes both for UTF-16 (with the "is UCS2" app
Marco van de Voort schrieb:
>
>
>> For many people Unicode is just "let's go UTF-8". It's far more than that
>> and 100% supporting Unicode is even next to impossible.
>>
>
> Correct, but that is what I'm suggesting. UTF-16 is not a cure all either,
> only at a first superficial glance.
On Friday 26 September 2008 11.51:14 Graeme Geldenhuys wrote:
> On Fri, Sep 26, 2008 at 11:46 AM, Martin Schreiber <[EMAIL PROTECTED]> wrote:
> > It seems you prefer utf-8 over utf-16 for internal string encoding in a
> > GUI framework. Why?
> > I prefer utf-16 over utf-8 for MSEide+MSEgui because
On Fri, Sep 26, 2008 at 11:46 AM, Martin Schreiber <[EMAIL PROTECTED]> wrote:
> It seems you prefer utf-8 over utf-16 for internal string encoding in a GUI
> framework. Why?
> I prefer utf-16 over utf-8 for MSEide+MSEgui because *all* current users
> (including the Chinese) can use simple string in
In our previous episode, Martin Schreiber said:
> > Well if you have Utf-8 versions of all basic string processing
> > functions like Pos, Length, Copy, Insert etc you don't have to think
> > of encoding or anything. fpGUI uses UTF-8 internally, and I never have
> > to think about what encoding I'm
On Friday 26 September 2008 09.34:44 Graeme Geldenhuys wrote:
>
> Well if you have Utf-8 versions of all basic string processing
> functions like Pos, Length, Copy, Insert etc you don't have to think
> of encoding or anything. fpGUI uses UTF-8 internally, and I never have
> to think about what enco
In our previous episode, Dani?l Mantione said:
> >
> > Accepting both Arabic and Westernized Arabic numerals would possibly break a
> > lot of code anyway, since to string and back wouldn't be reversible.
>
> It has never been reversible. Think about val('$100',v);
See one line further down.
>
Op Fri, 26 Sep 2008, schreef Marco van de Voort:
In our previous episode, Dani?l Mantione said:
as I know D2009 (I think) handles this correctly, but I have no idea
how.
Let me put it like this: Someone writing a Russian/Arabic/Japanese spell
checker does not have to handle surrogates with
On Fri, Sep 26, 2008 at 11:31 AM, Marco van de Voort <[EMAIL PROTECTED]> wrote:
>> Someone writing a spell checker for old-Egyptian Hieroglyphs will have to
>> deal with surrogates. For those people UTF-16 has few advantages over
>> UTF-8, (allthough in practice it's still a bit easier to handle th
Op Fri, 26 Sep 2008, schreef Marco van de Voort:
In our previous episode, Dani?l Mantione said:
That's highly dependant on what you application does! If your
application primarily parses text files, it's relevant. :-)
Shortstrings & ansistrings won't go away. You'll still be able to code
f
On Fri, Sep 26, 2008 at 11:17 AM, Daniël Mantione
<[EMAIL PROTECTED]> wrote:
>
> Russian, Arabic, Japanese are languages in daily use on computers, countless
> electronic documents in these languages exist.
And most documents that exist in the world are in UTF-8 format: Save
to file, HTML document
In our previous episode, Dani?l Mantione said:
> > as I know D2009 (I think) handles this correctly, but I have no idea
> > how.
>
> Let me put it like this: Someone writing a Russian/Arabic/Japanese spell
> checker does not have to handle surrogates with UTF-16, but he does with
> UTF-8, i.e. U
On 26 Sep 2008, at 10:43, Michael Schnell wrote:
Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just
ignoring the surrogates ?
At least the Unix widestring manager fully supports surrogates (except
if you use the MSIDE-patched version, where it has been removed
because it is c
On Fri, Sep 26, 2008 at 11:11 AM, Ivo Steinmann <[EMAIL PROTECTED]> wrote:
>
> So in core, winnt is working with UTF16. All ANSI Winapi functions map
> to these winnt calls.
So then there is already a "conversion" going on. From ANSI api to
UTF16 api. I still think (and will try and put together
Ivo Steinmann schrieb:
>
> In the core of all windows nt systems, there's the NT API. The normal
> WinAPI is on the top of the NTAPI. the NT API itself uses UTF-16 as
> stringtype!
>
> type
> UNICODE_STRING = record
> Length: USHORT;
> MaximumLength: USHORT;
> Buffer: PWSTR;
> end;
Op Fri, 26 Sep 2008, schreef Graeme Geldenhuys:
On Fri, Sep 26, 2008 at 10:43 AM, Michael Schnell <[EMAIL PROTECTED]> wrote:
It's no different then UTF-16 if you want to do it properly. In both you
have to look out for surrogates.
Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done
In our previous episode, Dani?l Mantione said:
> > That's highly dependant on what you application does! If your
> > application primarily parses text files, it's relevant. :-)
>
> Shortstrings & ansistrings won't go away. You'll still be able to code
> fast text file parsers. Note that in such
Graeme Geldenhuys schrieb:
> On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
> <[EMAIL PROTECTED]> wrote:
>
>> Who says that? UTF-16 is simply chosen because it has features (supporting
>> all characters basically) ANSI doesn't?
>>
>
> Sorry, my message was unclear and I got somewhat mix
In our previous episode, Michael Schnell said:
> > It's no different then UTF-16 if you want to do it properly. In both you
> > have to look out for surrogates.
> >
> Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring
> the surrogates ?
No different as UTF-8 in principle.
Op Fri, 26 Sep 2008, schreef Graeme Geldenhuys:
On Fri, Sep 26, 2008 at 9:12 AM, Daniël Mantione
<[EMAIL PROTECTED]> wrote:
For me the speed of input/output is less relevant, this is limited by disk
speed anyway. It's the speed of processing that should be decisive.
That's highly dependant
On Fri, Sep 26, 2008 at 10:43 AM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
>> It's no different then UTF-16 if you want to do it properly. In both you
>> have to look out for surrogates.
>>
>
> Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring the
> surrogates ?
Lets hope
It's no different then UTF-16 if you want to do it properly. In both you
have to look out for surrogates.
Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring
the surrogates ? (AFAI understand, a Widechar is just 16 bit, it would
need to be 32 bit if surrogates were allow
Well if you have Utf-8 versions of all basic string processing
functions like Pos, Length, Copy, Insert etc
s[i] := 'x'; will be especially funny :).
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mail
In our previous episode, Florian Klaempfl said:
> > On Fri, Sep 26, 2008 at 9:27 AM, Florian Klaempfl
> > <[EMAIL PROTECTED]> wrote:
> >> Being honest, imo UTF-8 is only a hack to get unicode on platforms like
> >> unix.
> >
> > I don't know where you get that information,
>
> Rather simple: ini
In our previous episode, Aleksa Todorovic said:
> > I suppose it would be viable doing timing results for saving text
> > files as well. After all, 99% of the time, text files are stored in
> > UTF-8. So in D2009 you would first have to convert UTF-16 to UTF-8 and
> > then save. And the opposite wh
In our previous episode, Graeme Geldenhuys said:
> Yes I know we have had lengthy discussions about this before.
> Everybody (whoever they might be) keeps saying that UTF-16 was chosen
> for Tiburon's UnicodeString because it makes "significant speed gains"
> when calling the Windows API based on U
Graeme Geldenhuys schrieb:
> On Fri, Sep 26, 2008 at 9:27 AM, Florian Klaempfl
> <[EMAIL PROTECTED]> wrote:
>> Being honest, imo UTF-8 is only a hack to get unicode on platforms like
>> unix.
>
> I don't know where you get that information,
Rather simple: initially in unicode 1.0 there was only
On Fri, Sep 26, 2008 at 9:27 AM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
>
> Being honest, imo UTF-8 is only a hack to get unicode on platforms like
> unix.
I don't know where you get that information, but it's surely not what
I read from the unicode.org website.
UTF-8, UTF-16 and UTF-32 were
On Fri, Sep 26, 2008 at 9:19 AM, Aleksa Todorovic <[EMAIL PROTECTED]> wrote:
> I support decision of using UTF-16 over UTF-8. String processing is
> far more simpler, it's actually as simple as it should be.
And that's guarenteed to work with surrogate pairs as well? The
problem is, most people as
Graeme Geldenhuys schrieb:
> On Fri, Sep 26, 2008 at 9:04 AM, Graeme Geldenhuys
> <[EMAIL PROTECTED]> wrote:
>> So has anybody actually done a timing comparision? Do you have your
>> test code available? Do you have your results published? I'm
>> interested to see the timing results using different
On Fri, Sep 26, 2008 at 9:12 AM, Daniël Mantione
<[EMAIL PROTECTED]> wrote:
>
> For me the speed of input/output is less relevant, this is limited by disk
> speed anyway. It's the speed of processing that should be decisive.
That's highly dependant on what you application does! If your
applicatio
On Fri, Sep 26, 2008 at 09:04, Graeme Geldenhuys
<[EMAIL PROTECTED]> wrote:
> On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
> <[EMAIL PROTECTED]> wrote:
>>
> I suppose it would be viable doing timing results for saving text
> files as well. After all, 99% of the time, text files are stored in
Graeme Geldenhuys schrieb:
> On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
> <[EMAIL PROTECTED]> wrote:
>> Who says that? UTF-16 is simply chosen because it has features (supporting
>> all characters basically) ANSI doesn't?
>
> Sorry, my message was unclear and I got somewhat mixed up betwee
On Fri, Sep 26, 2008 at 9:04 AM, Graeme Geldenhuys
<[EMAIL PROTECTED]> wrote:
>
> So has anybody actually done a timing comparision? Do you have your
> test code available? Do you have your results published? I'm
> interested to see the timing results using different hardware.
What I'm getting at
Graeme Geldenhuys schreef:
On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
I suppose it would be viable doing timing results for saving text
files as well. After all, 99% of the time, text files are stored in
UTF-8.
Where did you get that number (99%) from? I don't think that is true,
exce
Op Fri, 26 Sep 2008, schreef Graeme Geldenhuys:
On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
Who says that? UTF-16 is simply chosen because it has features (supporting
all characters basically) ANSI doesn't?
Sorry, my message was unclear and I got somewhat
On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
>
> Who says that? UTF-16 is simply chosen because it has features (supporting
> all characters basically) ANSI doesn't?
Sorry, my message was unclear and I got somewhat mixed up between ANSI
and UTF-8. I meant the encod
Hello Graeme,
Thursday, September 25, 2008, 9:50:04 PM, you wrote:
GG> Yes I know we have had lengthy discussions about this before.
GG> Everybody (whoever they might be) keeps saying that UTF-16 was chosen
GG> for Tiburon's UnicodeString because it makes "significant speed gains"
GG> when callin
Graeme Geldenhuys schrieb:
Hi,
Yes I know we have had lengthy discussions about this before.
Everybody (whoever they might be) keeps saying that UTF-16 was chosen
for Tiburon's UnicodeString because it makes "significant speed gains"
when calling the Windows API based on UTF-16 - compared to the
Hi,
Yes I know we have had lengthy discussions about this before.
Everybody (whoever they might be) keeps saying that UTF-16 was chosen
for Tiburon's UnicodeString because it makes "significant speed gains"
when calling the Windows API based on UTF-16 - compared to the ANSI
API's. The whole debate
69 matches
Mail list logo