Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Marco van de Voort
In our previous episode, Ivanko B said: > > Do you mean replacing a character in an UCS-2/UCS-4 string can be > > implemented more efficiently than in an UTF-8/UTF-16 string? > > > > Sure, just scan the string char by char as array elements and replace > as matches encounter. Like working with int

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Ivanko B
> Do you mean replacing a character in an UCS-2/UCS-4 string can be > implemented more efficiently than in an UTF-8/UTF-16 string? > Sure, just scan the string char by char as array elements and replace as matches encounter. Like working with integer arrays. ___

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Ivanko B
Why deal with single characters, by index, when working with substrings also covers the single-character use? Possibly because it tens times as slower for multiple chars processed. ___ fpc-devel maillist - fpc-devel@lists.freepascal

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Hans-Peter Diettrich
Martin Schreiber schrieb: Am 21.08.2012 12:52, schrieb Hans-Peter Diettrich: The good ole Pos() can do that, why search for more complicated implementations? You still try to use old coding patterns which are simply inappropriate for dealing with Unicode strings. Why make a distinction between

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Graeme Geldenhuys schrieb: On 21 August 2012 13:03, Michael Schnell wrote: With "not so often" I meant program runtime: it is usually not called in a close long running loop. I have a program that does exactly that... Loads files to do CRC checking to see what changed. It's a recursive find-

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: length returns the number of characters. the number of elements, which can be of any size (in arrays in general). UTF8Length the number of codepoints. There must also be a function to return the number of bytes. Does someone know the name? Length(s)*sizeof(s[1]) D

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: utf8/16 -> ansi are a bit more involved. (since mapping many chars to few, naieve implementation requiring large lookupsets) A single 256 element array can be used for both directions. In Ansi to Unicode the char value is used to index the array of Unicode values,

Re: [fpc-devel] FPC -Rintel and -alr options

2012-08-21 Thread ABorka
Yes, you are right. I just comment out the options within fpc.cfg after I got the asm files (*.s) for study. On 8/21/2012 00:53, Sven Barth wrote: Am 21.08.2012 09:35, schrieb ABorka: This is exactly what I needed. "-alr -sr -Amasm" does it. I just put them into my "fpc.cfg" . Why did you

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Sven Barth
Am 21.08.2012 17:27, schrieb Paul Ishenin: 21.08.12, 23:21, Sven Barth пишет: There must also be a function to return the number of bytes. Does someone know the name? Length(s) * SizeOf(s[1]) It has the name ByteLength() O.o Again what learned... Regards, Sven _

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 17:21:27 +0200 Sven Barth wrote: >[...] > > length returns the number of characters. > > UTF8Length the number of codepoints. > > There must also be a function to return the number of bytes. > > Does someone know the name? > > Length(s) * SizeOf(s[1]) Cheater. ;) Mattias __

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Paul Ishenin
21.08.12, 23:21, Sven Barth пишет: There must also be a function to return the number of bytes. Does someone know the name? Length(s) * SizeOf(s[1]) It has the name ByteLength() Best regards, Paul Ishenin ___ fpc-devel maillist - fpc-devel@lists

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Sven Barth
Am 21.08.2012 16:44, schrieb Mattias Gaertner: On Tue, 21 Aug 2012 15:11:56 +0100 Graeme Geldenhuys wrote: On 21 August 2012 14:54, Marco van de Voort wrote: Doesn't sound wise. length(stringtype)=n should mean that the string takes sizeof(char)*n bytes. (give or take the #0#0) I'm not

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 15:38:31 +0200 "Ludo Brands" wrote: > > > > There is the large category of network apps. Most protocols > > are utf8 > > > or have a clear preference for utf8 (json for example). > > Databases are > > > an extension of that and have the additional complication that they

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 19:48:12 +0500 Ivanko B wrote: > If you replied to this mail then you lost me. > I don't understand what problem of UTF-8 for the RTL you want to point > out. Can you explain again? > == > Substringing etc manipulation only via normalizing to fixed-char type > wh

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Ivanko B
If you replied to this mail then you lost me. I don't understand what problem of UTF-8 for the RTL you want to point out. Can you explain again? == Substringing etc manipulation only via normalizing to fixed-char type which may be inefficient (especially because it performs for each i

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 15:11:56 +0100 Graeme Geldenhuys wrote: > On 21 August 2012 14:54, Marco van de Voort wrote: > > > > Doesn't sound wise. length(stringtype)=n should mean that the string takes > > sizeof(char)*n bytes. (give or take the #0#0) > > > I'm not sure what you are trying to accom

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 10:23:10 -0300 Marcos Douglas wrote: >[...] > >> I guess there is no good solution for TStrings. Whatever string type is > >> chosen, some programs will suffer. > > > > Why will some suffer? Simply default UnicodeString to the correct > > encoding on each platform, and no perf

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 14:54, Marco van de Voort wrote: > > Doesn't sound wise. length(stringtype)=n should mean that the string takes > sizeof(char)*n bytes. (give or take the #0#0) I'm not sure what you are trying to accomplish? Give me sample code that will cause a problem. In fpGUI I have UTF8L

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said: > The Char type would be defined as String[4] (max size in bytes of a > unicode codepoint) Doesn't sound wise. length(stringtype)=n should mean that the string takes sizeof(char)*n bytes. (give or take the #0#0) _

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 18:15:07 +0500 Ivanko B wrote: > For example? > == > Sometime reading directory/file names. Sometime PostgreSQL produces > UTF-8 dumps with errors causing problems to converting to single byte > encoding (KOI8-R) - me have to use the "-c" switch of ICONV for such >

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 14:13, Mattias Gaertner wrote: > One string type and native encoding. Do you mean the current AnsiString? I meant a string type that changes it's encoding based on the platform it is compiled for. UTF-16 under Windows, UTF-8 under others. The RTL then uses that sinle string type

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ludo Brands
> > There is the large category of network apps. Most protocols > are utf8 > > or have a clear preference for utf8 (json for example). > Databases are > > an extension of that and have the additional complication that they > > can mix codepages at any level. These apps can be quite > sensit

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marcos Douglas
On Tue, Aug 21, 2012 at 6:09 AM, Graeme Geldenhuys wrote: > Hi, > > On 21 August 2012 09:32, Mattias Gaertner wrote: >> >> IMO unicodestring should be the same on all platforms, because >> otherwise the character size switches per platform, > > > Please define "character" in your sentence above.

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Ivanko B
For example? == Sometime reading directory/file names. Sometime PostgreSQL produces UTF-8 dumps with errors causing problems to converting to single byte encoding (KOI8-R) - me have to use the "-c" switch of ICONV for such conversions. Really not seldom errors, but You (latins) are just

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 13:53:14 +0100 Graeme Geldenhuys wrote: > On 21 August 2012 13:03, Michael Schnell wrote: > > With "not so often" I meant program runtime: it is usually not called in a > > close long running loop. > > I have a program that does exactly that... Loads files to do CRC > check

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 14:05:33 +0200 "Ludo Brands" wrote: > > > > > Yes. But maybe these applications can be adapted easily. > > This discussion should be about the issues where the > > conversions matter and there is no simple workaround. It > > would be good if everyone who knows such a probl

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 13:03, Michael Schnell wrote: > With "not so often" I meant program runtime: it is usually not called in a > close long running loop. I have a program that does exactly that... Loads files to do CRC checking to see what changed. It's a recursive find-all that goes through 100k

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 14:22:17 +0200 Michael Schnell wrote: > On 08/21/2012 11:22 AM, Mattias Gaertner wrote: > > > > Lazarus does not force "unicodestring" to anything for the simple > > reason, that it does not use it. It only provides some functions for > > converting UTF-8 to/from unicodestring

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 11:22 AM, Mattias Gaertner wrote: Lazarus does not force "unicodestring" to anything for the simple reason, that it does not use it. It only provides some functions for converting UTF-8 to/from unicodestring. At the moment Lazarus does not even use UTF8String, because the RTL does

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 02:11 PM, Michael Schnell wrote: So maybe it should not compile myString[i] at all ... and provide a decent enumerator syntax instead. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailm

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 12:02 PM, Aleksa Todorovic wrote: Yes, they will most probably be scattered all around, but then - it's developer-related organizational challenge, not compiler one. The compiler should not in a large area produce code that does not work as a former version (that did not use Unicod

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ludo Brands
> > Yes. But maybe these applications can be adapted easily. > This discussion should be about the issues where the > conversions matter and there is no simple workaround. It > would be good if everyone who knows such a problem comes up > with it now, so the FPC team can give an advice and/or

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Martin Schreiber
Am 21.08.2012 12:52, schrieb Hans-Peter Diettrich: The good ole Pos() can do that, why search for more complicated implementations? You still try to use old coding patterns which are simply inappropriate for dealing with Unicode strings. Why make a distinction between searching for a single cha

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 01:09 PM, Graeme Geldenhuys wrote: Maybe so, but it does debunk the statement "does not happen too often". With "not so often" I meant program runtime: it is usually not called in a close long running loop. -Michael ___ fpc-devel mail

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Jonas Maebe
Mattias Gaertner wrote on Tue, 21 Aug 2012: But let's be realistic. Some conversions are not measurable and are ok. Case in point: the FPC Win32 RTL until now. It always uses the ansistring versions of OS interface functions, while NT-based Windows OSes internally all work with UTF-16. Th

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 11:09 AM, Graeme Geldenhuys wrote: Can't we just introduce UTF8String and UTF16String types. By the name they clearly state what encoding the hold. It does make sense to (optionally) provide dynamically encoded strings, so that it is possible to do library functions that work wit

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 12:09:52 +0100 Graeme Geldenhuys wrote: >[...] > >> This is a simple example, but look at all the conversions already. Now > >> if UnicodeString uses the correct encoding on each platform, the > >> conversions would be zero! > > > > No. On Windows you have to open UTF-8 files

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Hans-Peter Diettrich
Graeme Geldenhuys schrieb: On 20 August 2012 23:18, Hans-Peter Diettrich wrote: The Delphi developers wanted to implement what you suggest, but dropped that approach later again. When Embarcadero implemented Unicode support, Delphi was a pure Windows application. They had no need to think of

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Hans-Peter Diettrich
Ivanko B schrieb: For that reason there is no speed difference between using a UTF-16 or UTF-8 encoded string. Both can be coded equally efficient. == No in common, since UTF-8 needs error handling, replacing for unconvertable bytes etc operations which may effect initial data which

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Aleksa Todorovic schrieb: The problem here is that libraries floating around (including RTL and FCL) use different string types (UnicodeString, UTF8String, AnsiString), so the question is - is it possible to (re)write those libraries in a generic way (RawByteString?), so they can work with any s

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Hans-Peter Diettrich
Martin Schreiber schrieb: All "access a char by index into a string" code I have seen, 99.99% of the time work in a sequential manner. For that reason there is no speed difference between using a UTF-16 or UTF-8 encoded string. Both can be coded equally efficient. Graeme, this is simply not tru

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Hans-Peter Diettrich
Aleksa Todorovic schrieb: On Tue, Aug 21, 2012 at 10:16 AM, Ivanko B wrote: Handling 1..4(6) bytes is less efficient than handling surrogate *pairs*. === But surrogate pairs break array-like fast char access anyway, isn't it ? It's also "broken" in UTF8 in the same way - so none

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
Hi, On 21 August 2012 11:45, Mattias Gaertner wrote: > I agree that TStringList can easily create a performance problem, but > afaik loading a text into a GUI is not a good example to > show conversion overhead. Maybe so, but it does debunk the statement "does not happen too often". >> This is

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 15:12:03 +0500 Ivanko B wrote: > Because these documents are in UTF-8 parsing is about 2-3 > times faster on these documents, searching is about 20 to 50% faster > = > Because You name is latin ANSISTRING "Mattias Gaertner" :) Actually my name is Gärtner. The te

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 11:32, Marco van de Voort wrote: > All routines like capitalization (routinely used for case insensitve > comparison) get a lot more complicated. Obviously Unicode is a lot more complicated, because it is design for _all_ spoken and non-spoken languages. ASCII is minute in comp

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 14:59:57 +0500 Ivanko B wrote: > For that reason there is no > speed difference between using a UTF-16 or UTF-8 encoded string. Both > can be coded equally efficient. > == > No in common, since UTF-8 needs error handling, replacing for > unconvertable bytes etc o

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 10:24:38 +0100 Graeme Geldenhuys wrote: > On 21 August 2012 10:01, Mattias Gaertner wrote: > >> > The conversion is done only when entering and exiting the OS / GUI > >> > framework > >> > calls. I understand this does not happen too often. > >> > >> I beg to differ. > > > >

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said: > On 21 August 2012 10:19, Ivanko B wrote: > > Sure no problems for GUI. But how about processing large texts ? > > Same experience as before. I must add "processing large text" is a > vague statement. I think unicode or not is a bigger performanc

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
Because these documents are in UTF-8 parsing is about 2-3 times faster on these documents, searching is about 20 to 50% faster = Because You name is latin ANSISTRING "Mattias Gaertner" :) But Imagine gigabytes of 4 bytes/char UTF-8 text. __

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 14:19:44 +0500 Ivanko B wrote: > I have implemented multiple text edit/display widgets that do plenty > of string manipulation... all based on the UTF-8 encoding. I have > suffered NO speed penalties. > > Sure no problems for GUI. But how about processing la

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Aleksa Todorovic
On Tue, Aug 21, 2012 at 11:41 AM, Mattias Gaertner wrote: > > Theoretically you could rewrite the FCL to support UTF8String, > UnicodeString and AnsiString. But not at the same time. In an > application there is always be only one of them. So you have to ship for > each flavor a whole FCL plus all

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Ivanko B
Me always get excited how Graeme defends the solutions of his choice :) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Jonas Maebe
marcov wrote on Tue, 21 Aug 2012: In our previous episode, Mattias Gaertner said: For example under Linux file names are treated as UTF-8 but are only bytes. They can and they do contain invalid UTF-8 characters. If your program should support this, you must use a FindFirst with UTF-8. To be

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Ivanko B
For that reason there is no speed difference between using a UTF-16 or UTF-8 encoded string. Both can be coded equally efficient. == No in common, since UTF-8 needs error handling, replacing for unconvertable bytes etc operations which may effect initial data which makes per-byte comp

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
How well will your "access char via index" code perform on that? = It'll mean "now is the time to switch to UCS-4" :) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 11:07:26 +0200 Michael Schnell wrote: > On 08/21/2012 10:17 AM, Graeme Geldenhuys wrote: > > if you want to do string comparisons, one option is to normalise the > > text before you do a compare. > Other than the conversion necessary with system-calls when a different > en

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marco van de Voort
In our previous episode, Mattias Gaertner said: > > On 08/21/2012 10:32 AM, Mattias Gaertner wrote: > > > IMO unicodestring should be the same on all platforms, because > > > otherwise the character size switches per platform, which is hard to > > > test and asking for trouble. > > This does see

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 11:17:24 +0200 Aleksa Todorovic wrote: > On Tue, Aug 21, 2012 at 9:53 AM, Martin Schreiber wrote: > > Am 21.08.2012 09:31, schrieb Graeme Geldenhuys: > > > > > > Ehm, I did both. In the beginning MSEgui switched from Widestring to utf-8 > > encoded Ansistring because of the b

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 10:19, Ivanko B wrote: > Sure no problems for GUI. But how about processing large texts ? Same experience as before. I must add "processing large text" is a vague statement. -- Regards, - Graeme - ___ fpGUI - a cross-platform Fre

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Marco van de Voort
In our previous episode, Mattias Gaertner said: > > IMO unicodestring should be the same on all platforms, because > otherwise the character size switches per platform, which is hard to > test and asking for trouble. I think the big issue is more about what "string" will be when the FPC is compil

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 10:16, Ivanko B wrote: > Though me'm sure that latin people don't suffer from slowliness of > utf-8 where utf-8 = ansistring. And I gather you base your assumptions on MSEgui. MSEgui uses UCS-2, *not* UTF-16. I also believe MSEgui doesn't bother with surrogate pairs (please corr

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 10:01, Mattias Gaertner wrote: >> > The conversion is done only when entering and exiting the OS / GUI >> > framework >> > calls. I understand this does not happen too often. >> >> I beg to differ. > > Maybe you can name some example. OK, lets assume I'm under Linux and fpGUI

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 11:09:28 +0200 Michael Schnell wrote: > On 08/21/2012 10:32 AM, Mattias Gaertner wrote: > > IMO unicodestring should be the same on all platforms, because > > otherwise the character size switches per platform, which is hard to > > test and asking for trouble. > This does s

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
I have implemented multiple text edit/display widgets that do plenty of string manipulation... all based on the UTF-8 encoding. I have suffered NO speed penalties. Sure no problems for GUI. But how about processing large texts ? ___ fpc

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Aleksa Todorovic
On Tue, Aug 21, 2012 at 9:53 AM, Martin Schreiber wrote: > Am 21.08.2012 09:31, schrieb Graeme Geldenhuys: > > > Ehm, I did both. In the beginning MSEgui switched from Widestring to utf-8 > encoded Ansistring because of the buggy FPC widestring implementation > (MSEgui started with Delphi/Kylix).

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
Performance heavily depends on what you do and you can find good examples == Hmm.. are there implementations of UTF-8 substringing, string comparision etc - but not using intermediate HEAVY normalizations from/to fixed char length type for BOTH input arguments ? Though me'm sure th

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Martin Schreiber
Am 21.08.2012 09:55, schrieb Graeme Geldenhuys: On 21 August 2012 07:10, Ivanko B wrote: How about supporting in the RTL all versions of UCS-2& UTF-16 (for fast per-char access etc optimizations) and UTF-8 (for unlimited number of alphabets) ? All "access a char by index into a string" code

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 08:53, Martin Schreiber wrote: >> >> Yet another myth > > > Ehm, I did both. In the beginning MSEgui switched from Widestring to utf-8 Just because you had a bad experience doesn't doom the utf-8 encoding forever. Maybe you just had a buggy implementation. No coder is perfe

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 09:41, Ivanko B wrote: > UTF-8 is very-very slow compared to UCS-2 as to string manipulations > so its best usage is encoding source files (as done in MSEide). Please supply a test program that proves this. I don't believe you are correct. I have implemented multiple text edit

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
Hi, On 21 August 2012 09:32, Mattias Gaertner wrote: > > IMO unicodestring should be the same on all platforms, because > otherwise the character size switches per platform, Please define "character" in your sentence above. Are you referring to a Unicode codepoint, or a "printable character"? I

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 10:32 AM, Mattias Gaertner wrote: IMO unicodestring should be the same on all platforms, because otherwise the character size switches per platform, which is hard to test and asking for trouble. This does seem appropriate. But right now Delphi comparability forces 16 Bits and Laz

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 10:17 AM, Graeme Geldenhuys wrote: if you want to do string comparisons, one option is to normalise the text before you do a compare. Other than the conversion necessary with system-calls when a different encoding is used internally, comparing strings happens very often within t

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/21/2012 10:15 AM, Graeme Geldenhuys wrote: You're in for a surprise... With a statement that reads "It provides direct access to serial ports, TAPI, and the Microsoft Speech API." it should start sounding alarm bells for Linux developers. Of course you are very right and silly me did not

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 09:23:30 +0100 Graeme Geldenhuys wrote: >[...] > > The conversion is done only when entering and exiting the OS / GUI framework > > calls. I understand this does not happen too often. > > I beg to differ. Maybe you can name some example. Concrete problems can be solved, abst

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Aleksa Todorovic
On Tue, Aug 21, 2012 at 10:16 AM, Ivanko B wrote: > > Handling 1..4(6) bytes is less efficient than handling surrogate > *pairs*. > === > But surrogate pairs break array-like fast char access anyway, isn't it ? It's also "broken" in UTF8 in the same way - so none of them gets +1 on

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Tue, 21 Aug 2012 13:41:38 +0500 Ivanko B wrote: > But if you are such a UTF-16 (actually UCS-2 as > that is what MSEgui supports) fan > = > If Martin can implement UTF-16 (with surrogate pair) support in MSEgui > string units (and these units fully cover absenting code of FPC

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 07:10, Ivanko B wrote: > How about supporting in the RTL all versions of UCS-2 & UTF-16 (for > fast per-char access etc optimizations) and UTF-8 (for unlimited > number of alphabets) ? All "access a char by index into a string" code I have seen, 99.99% of the time work in a sequ

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Martin Schreiber
Am 21.08.2012 09:32, schrieb Mattias Gaertner: On Mon, 20 Aug 2012 20:56:46 +0200 Florian Klämpfl wrote: [...] The current situation is: - either somebody starts to implement support for unicodestring being utf-8 (or whatever) on linux in a compatible way with the current approach, then 2.8.0

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Martin Schreiber
Am 21.08.2012 09:31, schrieb Graeme Geldenhuys: On 21 August 2012 09:13, Martin Schreiber wrote: I disagree. Handling 1..4(6) bytes is less efficient than handling surrogate *pairs*. Yet another myth Ehm, I did both. In the beginning MSEgui switched from Widestring to utf-8 encoded Ans

Re: [fpc-devel] Unicode in the RTL (my ideas)

2012-08-21 Thread Graeme Geldenhuys
Hi, On 20 August 2012 23:18, Hans-Peter Diettrich wrote: > The Delphi developers wanted to implement what you suggest, but dropped that > approach later again. When Embarcadero implemented Unicode support, Delphi was a pure Windows application. They had no need to think of anything other than wh

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
But if you are such a UTF-16 (actually UCS-2 as that is what MSEgui supports) fan = If Martin can implement UTF-16 (with surrogate pair) support in MSEgui string units (and these units fully cover absenting code of FPC RTL ) then the things are excellent. PS: UTF-8 is very-very sl

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Mon, 20 Aug 2012 20:56:46 +0200 Florian Klämpfl wrote: >[...] > The current situation is: > - either somebody starts to implement support for unicodestring being > utf-8 (or whatever) on linux in a compatible way with the current > approach, then 2.8.0 will use this > - nobody works on it, the

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 09:13, Martin Schreiber wrote: > I disagree. Handling 1..4(6) bytes is less efficient than handling surrogate > *pairs*. Yet another myth But if you are such a UTF-16 (actually UCS-2 as that is what MSEgui supports) fan, why isn't MSEgui source code stored in UTF-16 encoding

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
Hi, On 21 August 2012 08:37, Michael Schnell wrote: > > But does that really suggest taking the effort to support other Unicode > variants ? Yes, if you want to to make the statement "FPC fully supports Unicode" > The conversion is done only when entering and exiting the OS / GUI framework > c

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
Hi, On 21 August 2012 08:28, Michael Schnell wrote: > > How can it be OK regarding comparing strings, when all Unicode variants > allow for multiple codings for the same single printable "character" (and > moreover what "character" do the users regard as "equal"). The Unicode Standard covers al

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
Handling 1..4(6) bytes is less efficient than handling surrogate *pairs*. === But surrogate pairs break array-like fast char access anyway, isn't it ? And there's a lot of room for optimizing utf-8 operation for instance http://bjoern.hoehrmann.de/utf-8/decoder/dfa/. Also a publicatio

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
On 21 August 2012 08:27, Michael Schnell wrote: > > I doubt that it will be possible to just compile it (e.g. for Linux) but > with optimum compatibility of the compiler, porting the source code should > be rather easy. You're in for a surprise... With a statement that reads "It provides direct

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Martin Schreiber
On Tuesday 21 August 2012 09:56:57 Ivanko B wrote: > For non-fixed char length there's nothing better than UTF8 (default > ASCII compatible, ready for any future alphabets,..). For fixed-char > length (fast string operations etc) also there's nothing better than > UCS-2 (the Earth coverage ) & UCS-

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
For non-fixed char length there's nothing better than UTF8 (default ASCII compatible, ready for any future alphabets,..). For fixed-char length (fast string operations etc) also there's nothing better than UCS-2 (the Earth coverage ) & UCS-4 (the galaxy coverage). The non-fixed char length UTF-

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Ivanko B
For non-fixed char length there's nothing better than UTF8 (default ASCII compatible, ready for any future alphabets,..). For fixed-char length (fast string operations etc) also there's nothing better than UCS-2 (the Earth coverage ) & UCS-4 (the galaxy coverage). The non-fixed char length UTF-16 (

Re: [fpc-devel] FPC -Rintel and -alr options

2012-08-21 Thread Sven Barth
Am 21.08.2012 09:35, schrieb ABorka: This is exactly what I needed. "-alr -sr -Amasm" does it. I just put them into my "fpc.cfg" . Why did you put this into your fpc.cfg? You are aware that with the "-s" switch no binary code is generated? Or are you protecting that with an IFDEF? Regards,

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Mattias Gaertner
On Mon, 20 Aug 2012 18:46:29 +0100 Hans-Peter Diettrich wrote: > Mattias Gaertner schrieb: > > > I guess most people would say that "good multi language Unicode support > > in FPC" requires a Unicode supporting RTL. > > Please clarify: *Unicode* or UTF-16 support? > > Unicode is covered by bot

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Graeme Geldenhuys
HI, On 20 August 2012 23:26, Hans-Peter Diettrich wrote: > > UCS2 is nowadays known as the BMP (Basic Multilingual Plane) of full > Unicode. The UCS2 is considered obsolete! Nothing else needs to be said. :) > Have a look at the full Unicode codepages, what is and what is not > part of the BMP

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
Sorry: I do think it would not harm to use UTF-16 as a default. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] FPC -Rintel and -alr options

2012-08-21 Thread ABorka
It would be nice to see it work with objdump also, but not a priority. With your help guys I was able to get the needed output using the fpc.cfg and the FPC parameters you guys mentioned. Thanks for the help <...snip...> On 8/21/2012 00:19, Sergei Gorelkin wrote: 21.08.2012 10:32, ABorka пише

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/20/2012 06:05 PM, Graeme Geldenhuys wrote: * UnicodeString is always UTF-16 (so everything but Windows takes a conversion penalty)! This is true of course, But does that really suggest taking the effort to support other Unicode variants ? The conversion is done only when entering and e

Re: [fpc-devel] FPC -Rintel and -alr options

2012-08-21 Thread ABorka
This is exactly what I needed. "-alr -sr -Amasm" does it. I just put them into my "fpc.cfg" . Thanks for all the help guys. On 8/21/2012 00:16, Jonas Maebe wrote: On 21 Aug 2012, at 08:32, ABorka wrote: On 8/20/2012 22:37, Sergei Gorelkin wrote: -R switch controls parsing assembler blocks i

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/20/2012 08:53 PM, Ivanko B wrote: Really the team seems to fights to FPC + Lazarus be capable of building thousands of Delphi based components - archivers, cyphers, audio processors etc things which people mostly like Delphi for and which seldom use specific Delphi features causing problems

Re: [fpc-devel] Unicode resource strings

2012-08-21 Thread Michael Schnell
On 08/20/2012 08:33 PM, Graeme Geldenhuys wrote: Such a restriction should NEVER be okay! How _can_ it be OK regarding comparing strings, when all Unicode variants allow for multiple codings for the same single printable "character" (and moreover what "character" do the users regard as "equal")

Re: [fpc-devel] FPC -Rintel and -alr options

2012-08-21 Thread Sergei Gorelkin
21.08.2012 10:32, ABorka пишет: That requires masm to compile the project. Likewise, using "-al" requires GNU AS to compile. The latter is typically installed together with FPC, so it just works transparently. What I actually want is to see the disassembled code from my project (as Intel Sy

  1   2   >