Re: [fpc-devel] Unicode resource strings

2012-08-23 Thread Sven Barth
Am 22.08.2012 21:45, schrieb Graeme Geldenhuys: On 22 August 2012 10:19, Sven Barth wrote: Depending on how they implement it this might indeed be an interesting feature that we could implement (cherry picking Delphi features ^^). It's already possible, just use IInterface (and TInterfacedOb

Re: [fpc-devel] Unicode resource strings

2012-08-23 Thread Michael Schnell
On 08/22/2012 07:30 PM, Ivanko B wrote: ... You need to stop your mailer from all the time replying to the wrong message. This makes the forum rather unreadable. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepasc

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Graeme Geldenhuys
On 22 August 2012 10:19, Sven Barth wrote: > Depending on how they implement it this might indeed be an interesting > feature that we could implement (cherry picking Delphi features ^^). It's already possible, just use IInterface (and TInterfacedObject) everywhere. :) -- Regards, - Graeme -

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Mattias Gaertner
On Wed, 22 Aug 2012 22:30:52 +0500 Ivanko B wrote: > Even if you would implement something like the Unix "find" or "ls" > programs, they would be more likely to be limited by I/O and all sorts > of file/directory attribute lookups than code page conversions of file > names. > > 1) I/

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Ivanko B
Even if you would implement something like the Unix "find" or "ls" programs, they would be more likely to be limited by I/O and all sorts of file/directory attribute lookups than code page conversions of file names. 1) I/O is heavily cached on modern a-lot-of-RAM machines & 2) conversi

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: In our previous episode, Hans-Peter Diettrich said: this is a huge move for a native code compiler. If FPC will follow, this sounds like a lot of work. I don't see much work here. The code for handling interface references exists, it only has to be applied to the new

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: > > this is a huge move for a native code compiler. If FPC will follow, this > > sounds like a lot of work. > > I don't see much work here. The code for handling interface references > exists, it only has to be applied to the new TObject type,

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Jonas Maebe
Graeme Geldenhuys wrote on Wed, 22 Aug 2012: Accessing a 100k of files (filenames to be exact) in a UTF-8 environment (Linux), which must all be stored in a UTF-16 string type. That's lots and lots of encoding conversions right there - in a tight loop. It's nevertheless a bad example, because

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Michael Schnell
On 08/21/2012 02:53 PM, Graeme Geldenhuys wrote: I have a program that does exactly that... Loads files to do CRC checking to see what changed. Hmm. I feel that reading files takes a lot m,ore CPU time than converting the stings at the border of the LCL. This of course does not include convert

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Hans-Peter Diettrich
Graeme Geldenhuys schrieb: On 22 August 2012 00:54, Hans-Peter Diettrich wrote: IMO string conversion and CRC are mutually exclusive. Accessing a 100k of files (filenames to be exact) in a UTF-8 environment (Linux), which must all be stored in a UTF-16 string type. Filenames typically deser

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: In our previous episode, Hans-Peter Diettrich said: utf8/16 -> ansi are a bit more involved. (since mapping many chars to few, naieve implementation requiring large lookupsets) A single 256 element array can be used for both directions. In Ansi to Unicode the char va

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 08/21/2012 02:53 PM, Graeme Geldenhuys wrote: http://blogs.embarcadero.com/jtembarcadero/2012/08/20/xe3-and-beyond/ Other than politics, the big news regarding technology seems to be that Objects (or whatever) seem to get reference counted and thus I understand ".

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Sven Barth
Am 22.08.2012 11:44, schrieb Marco van de Voort: In our previous episode, Sven Barth said: Objects (or whatever) seem to get reference counted and thus I understand ".Free" gets obsolete (like with Prism). Without assessment - this is a huge move for a native code compiler. If FPC will follow, t

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Mattias Gaertner
On Wed, 22 Aug 2012 11:35:17 +0200 Michael Schnell wrote: > On 08/22/2012 10:56 AM, Mattias Gaertner wrote: > > The UTF-8 optimized functions needs UTF-16 versions. But why do you > > mean it needs a "really thorough rework"? > Guesswork :-) > > The LCL itself already has some widgetsets using

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Marco van de Voort
In our previous episode, Sven Barth said: > > Objects (or whatever) seem to get reference counted and thus I > > understand ".Free" gets obsolete (like with Prism). Without assessment - > > this is a huge move for a native code compiler. If FPC will follow, this > > sounds like a lot of work. > >

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Michael Schnell
On 08/22/2012 11:19 AM, Sven Barth wrote: Depending on how they implement it this might indeed be an interesting feature that we could implement (cherry picking Delphi features ^^). It will be interesting to watch if they might implement other Prism goodies as well (e.g. parallel loops and f

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Michael Schnell
On 08/22/2012 10:56 AM, Mattias Gaertner wrote: The UTF-8 optimized functions needs UTF-16 versions. But why do you mean it needs a "really thorough rework"? Guesswork :-) The LCL itself already has some widgetsets using UTF-16. Yep. So there the conversion needs to be dropped, while with the

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Sven Barth
Am 22.08.2012 11:08, schrieb Michael Schnell: On 08/21/2012 02:53 PM, Graeme Geldenhuys wrote: http://blogs.embarcadero.com/jtembarcadero/2012/08/20/xe3-and-beyond/ Other than politics, the big news regarding technology seems to be that Objects (or whatever) seem to get reference counted and th

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Michael Schnell
On 08/21/2012 02:53 PM, Graeme Geldenhuys wrote: http://blogs.embarcadero.com/jtembarcadero/2012/08/20/xe3-and-beyond/ Other than politics, the big news regarding technology seems to be that Objects (or whatever) seem to get reference counted and thus I understand ".Free" gets obsolete (like wi

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Mattias Gaertner
On Wed, 22 Aug 2012 10:37:45 +0200 Michael Schnell wrote: > On 08/21/2012 02:53 PM, Mattias Gaertner wrote: > > If the FCL moves to another string or starts enforcing an encoding the > > LCL has to be adapted. > > I believe if "String" becomes a sequence of 16 bit entities instead of 8 > bit e

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Michael Schnell
On 08/21/2012 02:53 PM, Mattias Gaertner wrote: If the FCL moves to another string or starts enforcing an encoding the LCL has to be adapted. I believe if "String" becomes a sequence of 16 bit entities instead of 8 bit entities, the LCL needs a really thorough rework. In the Lazarus form som

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Michael Schnell
On 08/21/2012 02:53 PM, Mattias Gaertner wrote: The LCL uses the same string as the FCL classes. Yep: type TCaption = TTranslateString; ... TTranslateString = type String; The FCL uses 8-bit strings ... Isn't this exactly what I tried to point out ? AFAIK in newer Delphi TCaption is St

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Graeme Geldenhuys
On 22 August 2012 00:54, Hans-Peter Diettrich wrote: > IMO string conversion and CRC are mutually exclusive. Accessing a 100k of files (filenames to be exact) in a UTF-8 environment (Linux), which must all be stored in a UTF-16 string type. That's lots and lots of encoding conversions right there

Re: [fpc-devel] Unicode resource strings

2012-08-22 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: > > utf8/16 -> ansi are a bit more involved. (since mapping many chars to few, > > naieve implementation requiring large lookupsets) > > A single 256 element array can be used for both directions. In Ansi to > Unicode the char value is used to i

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Graeme Geldenhuys schrieb: On 21 August 2012 13:03, Michael Schnell wrote: With "not so often" I meant program runtime: it is usually not called in a close long running loop. I have a program that does exactly that... Loads files to do CRC checking to see what changed. It's a recursive find-

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: length returns the number of characters. the number of elements, which can be of any size (in arrays in general). UTF8Length the number of codepoints. There must also be a function to return the number of bytes. Does someone know the name? Length(s)*sizeof(s[1]) D

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: utf8/16 -> ansi are a bit more involved. (since mapping many chars to few, naieve implementation requiring large lookupsets) A single 256 element array can be used for both directions. In Ansi to Unicode the char value is used to index the array of Unicode values,

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Sven Barth
Am 21.08.2012 17:27, schrieb Paul Ishenin: 21.08.12, 23:21, Sven Barth пишет: There must also be a function to return the number of bytes. Does someone know the name? Length(s) * SizeOf(s[1]) It has the name ByteLength() O.o Again what learned... Regards, Sven _

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 17:21:27 +0200 Sven Barth wrote: >[...] > > length returns the number of characters. > > UTF8Length the number of codepoints. > > There must also be a function to return the number of bytes. > > Does someone know the name? > > Length(s) * SizeOf(s[1]) Cheater. ;) Mattias __

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Paul Ishenin
21.08.12, 23:21, Sven Barth пишет: There must also be a function to return the number of bytes. Does someone know the name? Length(s) * SizeOf(s[1]) It has the name ByteLength() Best regards, Paul Ishenin ___ fpc-devel maillist - fpc-devel@lists

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Sven Barth
Am 21.08.2012 16:44, schrieb Mattias Gaertner: On Tue, 21 Aug 2012 15:11:56 +0100 Graeme Geldenhuys wrote: On 21 August 2012 14:54, Marco van de Voort wrote: Doesn't sound wise. length(stringtype)=n should mean that the string takes sizeof(char)*n bytes. (give or take the #0#0) I'm not

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 15:38:31 +0200 "Ludo Brands" wrote: > > > > There is the large category of network apps. Most protocols > > are utf8 > > > or have a clear preference for utf8 (json for example). > > Databases are > > > an extension of that and have the additional complication that they

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 15:11:56 +0100 Graeme Geldenhuys wrote: > On 21 August 2012 14:54, Marco van de Voort wrote: > > > > Doesn't sound wise. length(stringtype)=n should mean that the string takes > > sizeof(char)*n bytes. (give or take the #0#0) > > > I'm not sure what you are trying to accom

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 10:23:10 -0300 Marcos Douglas wrote: >[...] > >> I guess there is no good solution for TStrings. Whatever string type is > >> chosen, some programs will suffer. > > > > Why will some suffer? Simply default UnicodeString to the correct > > encoding on each platform, and no perf

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 14:54, Marco van de Voort wrote: > > Doesn't sound wise. length(stringtype)=n should mean that the string takes > sizeof(char)*n bytes. (give or take the #0#0) I'm not sure what you are trying to accomplish? Give me sample code that will cause a problem. In fpGUI I have UTF8L

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said: > The Char type would be defined as String[4] (max size in bytes of a > unicode codepoint) Doesn't sound wise. length(stringtype)=n should mean that the string takes sizeof(char)*n bytes. (give or take the #0#0) _

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 14:13, Mattias Gaertner wrote: > One string type and native encoding. Do you mean the current AnsiString? I meant a string type that changes it's encoding based on the platform it is compiled for. UTF-16 under Windows, UTF-8 under others. The RTL then uses that sinle string type

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ludo Brands
> > There is the large category of network apps. Most protocols > are utf8 > > or have a clear preference for utf8 (json for example). > Databases are > > an extension of that and have the additional complication that they > > can mix codepages at any level. These apps can be quite > sensit

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marcos Douglas
On Tue, Aug 21, 2012 at 6:09 AM, Graeme Geldenhuys wrote: > Hi, > > On 21 August 2012 09:32, Mattias Gaertner wrote: >> >> IMO unicodestring should be the same on all platforms, because >> otherwise the character size switches per platform, > > > Please define "character" in your sentence above.

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 13:53:14 +0100 Graeme Geldenhuys wrote: > On 21 August 2012 13:03, Michael Schnell wrote: > > With "not so often" I meant program runtime: it is usually not called in a > > close long running loop. > > I have a program that does exactly that... Loads files to do CRC > check

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 14:05:33 +0200 "Ludo Brands" wrote: > > > > > Yes. But maybe these applications can be adapted easily. > > This discussion should be about the issues where the > > conversions matter and there is no simple workaround. It > > would be good if everyone who knows such a probl

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 13:03, Michael Schnell wrote: > With "not so often" I meant program runtime: it is usually not called in a > close long running loop. I have a program that does exactly that... Loads files to do CRC checking to see what changed. It's a recursive find-all that goes through 100k

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 14:22:17 +0200 Michael Schnell wrote: > On 08/21/2012 11:22 AM, Mattias Gaertner wrote: > > > > Lazarus does not force "unicodestring" to anything for the simple > > reason, that it does not use it. It only provides some functions for > > converting UTF-8 to/from unicodestring

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 11:22 AM, Mattias Gaertner wrote: Lazarus does not force "unicodestring" to anything for the simple reason, that it does not use it. It only provides some functions for converting UTF-8 to/from unicodestring. At the moment Lazarus does not even use UTF8String, because the RTL does

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 02:11 PM, Michael Schnell wrote: So maybe it should not compile myString[i] at all ... and provide a decent enumerator syntax instead. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailm

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 12:02 PM, Aleksa Todorovic wrote: Yes, they will most probably be scattered all around, but then - it's developer-related organizational challenge, not compiler one. The compiler should not in a large area produce code that does not work as a former version (that did not use Unicod

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ludo Brands
> > Yes. But maybe these applications can be adapted easily. > This discussion should be about the issues where the > conversions matter and there is no simple workaround. It > would be good if everyone who knows such a problem comes up > with it now, so the FPC team can give an advice and/or

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 01:09 PM, Graeme Geldenhuys wrote: Maybe so, but it does debunk the statement "does not happen too often". With "not so often" I meant program runtime: it is usually not called in a close long running loop. -Michael ___ fpc-devel mail

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Jonas Maebe
Mattias Gaertner wrote on Tue, 21 Aug 2012: But let's be realistic. Some conversions are not measurable and are ok. Case in point: the FPC Win32 RTL until now. It always uses the ansistring versions of OS interface functions, while NT-based Windows OSes internally all work with UTF-16. Th

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 11:09 AM, Graeme Geldenhuys wrote: Can't we just introduce UTF8String and UTF16String types. By the name they clearly state what encoding the hold. It does make sense to (optionally) provide dynamically encoded strings, so that it is possible to do library functions that work wit

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 12:09:52 +0100 Graeme Geldenhuys wrote: >[...] > >> This is a simple example, but look at all the conversions already. Now > >> if UnicodeString uses the correct encoding on each platform, the > >> conversions would be zero! > > > > No. On Windows you have to open UTF-8 files

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Aleksa Todorovic schrieb: The problem here is that libraries floating around (including RTL and FCL) use different string types (UnicodeString, UTF8String, AnsiString), so the question is - is it possible to (re)write those libraries in a generic way (RawByteString?), so they can work with any s

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Aleksa Todorovic schrieb: On Tue, Aug 21, 2012 at 10:16 AM, Ivanko B wrote: Handling 1..4(6) bytes is less efficient than handling surrogate *pairs*. === But surrogate pairs break array-like fast char access anyway, isn't it ? It's also "broken" in UTF8 in the same way - so none

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
Hi, On 21 August 2012 11:45, Mattias Gaertner wrote: > I agree that TStringList can easily create a performance problem, but > afaik loading a text into a GUI is not a good example to > show conversion overhead. Maybe so, but it does debunk the statement "does not happen too often". >> This is

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 15:12:03 +0500 Ivanko B wrote: > Because these documents are in UTF-8 parsing is about 2-3 > times faster on these documents, searching is about 20 to 50% faster > = > Because You name is latin ANSISTRING "Mattias Gaertner" :) Actually my name is Gärtner. The te

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 11:32, Marco van de Voort wrote: > All routines like capitalization (routinely used for case insensitve > comparison) get a lot more complicated. Obviously Unicode is a lot more complicated, because it is design for _all_ spoken and non-spoken languages. ASCII is minute in comp

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 10:24:38 +0100 Graeme Geldenhuys wrote: > On 21 August 2012 10:01, Mattias Gaertner wrote: > >> > The conversion is done only when entering and exiting the OS / GUI > >> > framework > >> > calls. I understand this does not happen too often. > >> > >> I beg to differ. > > > >

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said: > On 21 August 2012 10:19, Ivanko B wrote: > > Sure no problems for GUI. But how about processing large texts ? > > Same experience as before. I must add "processing large text" is a > vague statement. I think unicode or not is a bigger performanc

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
Because these documents are in UTF-8 parsing is about 2-3 times faster on these documents, searching is about 20 to 50% faster = Because You name is latin ANSISTRING "Mattias Gaertner" :) But Imagine gigabytes of 4 bytes/char UTF-8 text. __

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 14:19:44 +0500 Ivanko B wrote: > I have implemented multiple text edit/display widgets that do plenty > of string manipulation... all based on the UTF-8 encoding. I have > suffered NO speed penalties. > > Sure no problems for GUI. But how about processing la

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Aleksa Todorovic
On Tue, Aug 21, 2012 at 11:41 AM, Mattias Gaertner wrote: > > Theoretically you could rewrite the FCL to support UTF8String, > UnicodeString and AnsiString. But not at the same time. In an > application there is always be only one of them. So you have to ship for > each flavor a whole FCL plus all

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Jonas Maebe
marcov wrote on Tue, 21 Aug 2012: In our previous episode, Mattias Gaertner said: For example under Linux file names are treated as UTF-8 but are only bytes. They can and they do contain invalid UTF-8 characters. If your program should support this, you must use a FindFirst with UTF-8. To be

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
How well will your "access char via index" code perform on that? = It'll mean "now is the time to switch to UCS-4" :) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 11:07:26 +0200 Michael Schnell wrote: > On 08/21/2012 10:17 AM, Graeme Geldenhuys wrote: > > if you want to do string comparisons, one option is to normalise the > > text before you do a compare. > Other than the conversion necessary with system-calls when a different > en

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marco van de Voort
In our previous episode, Mattias Gaertner said: > > On 08/21/2012 10:32 AM, Mattias Gaertner wrote: > > > IMO unicodestring should be the same on all platforms, because > > > otherwise the character size switches per platform, which is hard to > > > test and asking for trouble. > > This does see

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 11:17:24 +0200 Aleksa Todorovic wrote: > On Tue, Aug 21, 2012 at 9:53 AM, Martin Schreiber wrote: > > Am 21.08.2012 09:31, schrieb Graeme Geldenhuys: > > > > > > Ehm, I did both. In the beginning MSEgui switched from Widestring to utf-8 > > encoded Ansistring because of the b

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 10:19, Ivanko B wrote: > Sure no problems for GUI. But how about processing large texts ? Same experience as before. I must add "processing large text" is a vague statement. -- Regards, - Graeme - ___ fpGUI - a cross-platform Fre

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marco van de Voort
In our previous episode, Mattias Gaertner said: > > IMO unicodestring should be the same on all platforms, because > otherwise the character size switches per platform, which is hard to > test and asking for trouble. I think the big issue is more about what "string" will be when the FPC is compil

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 10:16, Ivanko B wrote: > Though me'm sure that latin people don't suffer from slowliness of > utf-8 where utf-8 = ansistring. And I gather you base your assumptions on MSEgui. MSEgui uses UCS-2, *not* UTF-16. I also believe MSEgui doesn't bother with surrogate pairs (please corr

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 10:01, Mattias Gaertner wrote: >> > The conversion is done only when entering and exiting the OS / GUI >> > framework >> > calls. I understand this does not happen too often. >> >> I beg to differ. > > Maybe you can name some example. OK, lets assume I'm under Linux and fpGUI

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 11:09:28 +0200 Michael Schnell wrote: > On 08/21/2012 10:32 AM, Mattias Gaertner wrote: > > IMO unicodestring should be the same on all platforms, because > > otherwise the character size switches per platform, which is hard to > > test and asking for trouble. > This does s

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
I have implemented multiple text edit/display widgets that do plenty of string manipulation... all based on the UTF-8 encoding. I have suffered NO speed penalties. Sure no problems for GUI. But how about processing large texts ? ___ fpc

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Aleksa Todorovic
On Tue, Aug 21, 2012 at 9:53 AM, Martin Schreiber wrote: > Am 21.08.2012 09:31, schrieb Graeme Geldenhuys: > > > Ehm, I did both. In the beginning MSEgui switched from Widestring to utf-8 > encoded Ansistring because of the buggy FPC widestring implementation > (MSEgui started with Delphi/Kylix).

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
Performance heavily depends on what you do and you can find good examples == Hmm.. are there implementations of UTF-8 substringing, string comparision etc - but not using intermediate HEAVY normalizations from/to fixed char length type for BOTH input arguments ? Though me'm sure th

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 08:53, Martin Schreiber wrote: >> >> Yet another myth > > > Ehm, I did both. In the beginning MSEgui switched from Widestring to utf-8 Just because you had a bad experience doesn't doom the utf-8 encoding forever. Maybe you just had a buggy implementation. No coder is perfe

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 09:41, Ivanko B wrote: > UTF-8 is very-very slow compared to UCS-2 as to string manipulations > so its best usage is encoding source files (as done in MSEide). Please supply a test program that proves this. I don't believe you are correct. I have implemented multiple text edit

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
Hi, On 21 August 2012 09:32, Mattias Gaertner wrote: > > IMO unicodestring should be the same on all platforms, because > otherwise the character size switches per platform, Please define "character" in your sentence above. Are you referring to a Unicode codepoint, or a "printable character"? I

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 10:32 AM, Mattias Gaertner wrote: IMO unicodestring should be the same on all platforms, because otherwise the character size switches per platform, which is hard to test and asking for trouble. This does seem appropriate. But right now Delphi comparability forces 16 Bits and Laz

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 10:17 AM, Graeme Geldenhuys wrote: if you want to do string comparisons, one option is to normalise the text before you do a compare. Other than the conversion necessary with system-calls when a different encoding is used internally, comparing strings happens very often within t

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 10:15 AM, Graeme Geldenhuys wrote: You're in for a surprise... With a statement that reads "It provides direct access to serial ports, TAPI, and the Microsoft Speech API." it should start sounding alarm bells for Linux developers. Of course you are very right and silly me did not

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 09:23:30 +0100 Graeme Geldenhuys wrote: >[...] > > The conversion is done only when entering and exiting the OS / GUI framework > > calls. I understand this does not happen too often. > > I beg to differ. Maybe you can name some example. Concrete problems can be solved, abst

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Aleksa Todorovic
On Tue, Aug 21, 2012 at 10:16 AM, Ivanko B wrote: > > Handling 1..4(6) bytes is less efficient than handling surrogate > *pairs*. > === > But surrogate pairs break array-like fast char access anyway, isn't it ? It's also "broken" in UTF8 in the same way - so none of them gets +1 on

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 13:41:38 +0500 Ivanko B wrote: > But if you are such a UTF-16 (actually UCS-2 as > that is what MSEgui supports) fan > = > If Martin can implement UTF-16 (with surrogate pair) support in MSEgui > string units (and these units fully cover absenting code of FPC

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Martin Schreiber
Am 21.08.2012 09:32, schrieb Mattias Gaertner: On Mon, 20 Aug 2012 20:56:46 +0200 Florian Klämpfl wrote: [...] The current situation is: - either somebody starts to implement support for unicodestring being utf-8 (or whatever) on linux in a compatible way with the current approach, then 2.8.0

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Martin Schreiber
Am 21.08.2012 09:31, schrieb Graeme Geldenhuys: On 21 August 2012 09:13, Martin Schreiber wrote: I disagree. Handling 1..4(6) bytes is less efficient than handling surrogate *pairs*. Yet another myth Ehm, I did both. In the beginning MSEgui switched from Widestring to utf-8 encoded Ans

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
But if you are such a UTF-16 (actually UCS-2 as that is what MSEgui supports) fan = If Martin can implement UTF-16 (with surrogate pair) support in MSEgui string units (and these units fully cover absenting code of FPC RTL ) then the things are excellent. PS: UTF-8 is very-very sl

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Mon, 20 Aug 2012 20:56:46 +0200 Florian Klämpfl wrote: >[...] > The current situation is: > - either somebody starts to implement support for unicodestring being > utf-8 (or whatever) on linux in a compatible way with the current > approach, then 2.8.0 will use this > - nobody works on it, the

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 09:13, Martin Schreiber wrote: > I disagree. Handling 1..4(6) bytes is less efficient than handling surrogate > *pairs*. Yet another myth But if you are such a UTF-16 (actually UCS-2 as that is what MSEgui supports) fan, why isn't MSEgui source code stored in UTF-16 encoding

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
Hi, On 21 August 2012 08:37, Michael Schnell wrote: > > But does that really suggest taking the effort to support other Unicode > variants ? Yes, if you want to to make the statement "FPC fully supports Unicode" > The conversion is done only when entering and exiting the OS / GUI framework > c

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
Hi, On 21 August 2012 08:28, Michael Schnell wrote: > > How can it be OK regarding comparing strings, when all Unicode variants > allow for multiple codings for the same single printable "character" (and > moreover what "character" do the users regard as "equal"). The Unicode Standard covers al

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
Handling 1..4(6) bytes is less efficient than handling surrogate *pairs*. === But surrogate pairs break array-like fast char access anyway, isn't it ? And there's a lot of room for optimizing utf-8 operation for instance http://bjoern.hoehrmann.de/utf-8/decoder/dfa/. Also a publicatio

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 08:27, Michael Schnell wrote: > > I doubt that it will be possible to just compile it (e.g. for Linux) but > with optimum compatibility of the compiler, porting the source code should > be rather easy. You're in for a surprise... With a statement that reads "It provides direct

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Martin Schreiber
On Tuesday 21 August 2012 09:56:57 Ivanko B wrote: > For non-fixed char length there's nothing better than UTF8 (default > ASCII compatible, ready for any future alphabets,..). For fixed-char > length (fast string operations etc) also there's nothing better than > UCS-2 (the Earth coverage ) & UCS-

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
For non-fixed char length there's nothing better than UTF8 (default ASCII compatible, ready for any future alphabets,..). For fixed-char length (fast string operations etc) also there's nothing better than UCS-2 (the Earth coverage ) & UCS-4 (the galaxy coverage). The non-fixed char length UTF-

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
For non-fixed char length there's nothing better than UTF8 (default ASCII compatible, ready for any future alphabets,..). For fixed-char length (fast string operations etc) also there's nothing better than UCS-2 (the Earth coverage ) & UCS-4 (the galaxy coverage). The non-fixed char length UTF-16 (

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Mon, 20 Aug 2012 18:46:29 +0100 Hans-Peter Diettrich wrote: > Mattias Gaertner schrieb: > > > I guess most people would say that "good multi language Unicode support > > in FPC" requires a Unicode supporting RTL. > > Please clarify: *Unicode* or UTF-16 support? > > Unicode is covered by bot

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
HI, On 20 August 2012 23:26, Hans-Peter Diettrich wrote: > > UCS2 is nowadays known as the BMP (Basic Multilingual Plane) of full > Unicode. The UCS2 is considered obsolete! Nothing else needs to be said. :) > Have a look at the full Unicode codepages, what is and what is not > part of the BMP

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
Sorry: I do think it would not harm to use UTF-16 as a default. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/20/2012 06:05 PM, Graeme Geldenhuys wrote: * UnicodeString is always UTF-16 (so everything but Windows takes a conversion penalty)! This is true of course, But does that really suggest taking the effort to support other Unicode variants ? The conversion is done only when entering and e

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/20/2012 08:53 PM, Ivanko B wrote: Really the team seems to fights to FPC + Lazarus be capable of building thousands of Delphi based components - archivers, cyphers, audio processors etc things which people mostly like Delphi for and which seldom use specific Delphi features causing problems

  1   2   >