Re: [fpc-devel] Re: enumerators

2010-11-18 Thread Marco van de Voort
In our previous episode, Michael Schnell said: > > Either you have UTF-8 with surrogates, or you have ASCII (since UTF-8 > > without surrogates means that only char 0..127 are valid, which is ASCII) > In another post surrogate pairs have been denoted as a specialty of a 16 > Bit coding (UCS-2), an

Re: [fpc-devel] Re: enumerators

2010-11-18 Thread Michael Schnell
On 11/18/2010 02:31 PM, Marco van de Voort wrote: Either you have UTF-8 with surrogates, or you have ASCII (since UTF-8 without surrogates means that only char 0..127 are valid, which is ASCII) In another post surrogate pairs have been denoted as a specialty of a 16 Bit coding (UCS-2), and I di

Re: [fpc-devel] Re: enumerators

2010-11-18 Thread Marco van de Voort
In our previous episode, Michael Schnell said: > > found by a dumb byte/char scan; only few encodings have to be > > recognized and handled, based on the char size: MBCS (UTF-8...), > > WideChars (UTF-16/UCS2) and UTF-32. > > > In fact I suppose that for UTF-8 ("pure UTF-8" without surrogates) po

Re: [fpc-devel] Re: enumerators

2010-11-18 Thread Michael Schnell
On 11/18/2010 12:33 AM, Hans-Peter Diettrich wrote: Separator characters can be assumed as ASCII, so that they can be found by a dumb byte/char scan; only few encodings have to be recognized and handled, based on the char size: MBCS (UTF-8...), WideChars (UTF-16/UCS2) and UTF-32. In fact I su

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: It's an users own choice to not be unicode compliant in his apps (e.g. if he knows he never goes to the Eastern Asiatic market etc), but a runtime should be as unicode compliant as reasonably possible. IMO there exist levels of compliance. The bottom level supplies

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Jonas Maebe
On 17 Nov 2010, at 13:44, Michael Schnell wrote: In fact I was not aware of the UTF-16 coding scheme. I _supposed_ it would work similar as UTF-8 (highest bit set => 32 bit value composed from the 31 remaining bits of this and the next word and bit 31 reset) and thus could be decoded algor

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell
On 11/17/2010 01:32 PM, Marco van de Voort wrote: Regarding OS X, iirc I saw a mention somewhere that some components of Mac OS X prefer decomposed characters. (aka UTF-8Mac). In another forum I saw this mentioned as surrogate pairs. Sorry for the confusion :(. -Michael ___

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell
On 11/17/2010 01:20 PM, Jonas Maebe wrote: Surrogate pairs have nothing to do with Mac OS X. Surrogate pairs are required when encoding any codepoint in UTF-16 whose UTF32 value is >= $1. In fact I was not aware of the UTF-16 coding scheme. I _supposed_ it would work similar as UTF-8 (hi

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Marco van de Voort
In our previous episode, Michael Schnell said: > > It is not viable not to. Either you implement unicode or not. > > > > It's an users own choice to not be unicode compliant in his apps (e.g. if he > > knows he never goes to the Eastern Asiatic market etc), but a runtime should > > be as unicode co

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Jonas Maebe
On 17 Nov 2010, at 12:23, Michael Schnell wrote: Regarding that handling surrogate pairs needs tables while UTF/UCS handling can be done by simple algorithms and that (AFAIK) surrogate pairs are used only in certain environments (Mac and what else ?) Surrogate pairs have nothing to do with

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell
On 11/17/2010 12:02 PM, Marco van de Voort wrote: In our previous episode, Michael Schnell said: Only the ones where surrogates really matter. Is is really viable to have the compiler/RTL try to automatically handle these ugly beasts, It is not viable not to. Either you implement unicode or no

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Marco van de Voort
In our previous episode, Michael Schnell said: > > > > Only the ones where surrogates really matter. > Is is really viable to have the compiler/RTL try to automatically handle > these ugly beasts, It is not viable not to. Either you implement unicode or not. It's an users own choice to not be

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell
On 11/17/2010 10:12 AM, Marco van de Voort wrote: Only the ones where surrogates really matter. Is is really viable to have the compiler/RTL try to automatically handle these ugly beasts, rather than presenting them to the poor user as two separate Unicode characters (and only handle the UTC/U

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Michael Schnell
On 11/15/2010 01:24 PM, Marco van de Voort wrote: Typically I'd iterate by means outside the language (I've used simple iterators based on a record with a few inline methods in the past), and review the places where you iterate by char through strings, and reduce it signficantly. Since the latt

Re: [fpc-devel] Re: enumerators

2010-11-17 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: > > > > I don't consider it an extreme, on the contrary. Trying to fix this is > > extreme IMHO. > > Sorry, I understood that you want to replace all for loops by iterated > loops. Only the ones where surrogates really matter. > >>> And in

Re: [fpc-devel] Re: enumerators

2010-11-16 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: In our previous episode, Hans-Peter Diettrich said: Yes, but the realisation should be that the holding on array indexing is what makes it expensive. The problem could be strongly reduced by removing such array indexing skeleton from all routines where it is not neces

Re: [fpc-devel] Re: enumerators

2010-11-16 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: > > Yes, but the realisation should be that the holding on array indexing is > > what makes it expensive. The problem could be strongly reduced by removing > > such array indexing skeleton from all routines where it is not necessary. > > Why fall

Re: [fpc-devel] Re: enumerators

2010-11-16 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: Yes, but the realisation should be that the holding on array indexing is what makes it expensive. The problem could be strongly reduced by removing such array indexing skeleton from all routines where it is not necessary. Why fall from one extreme into the other one

Re: [fpc-devel] Re: enumerators

2010-11-16 Thread Michael Van Canneyt
On Tue, 16 Nov 2010, Marco van de Voort wrote: Furthermore I think that in detail Unicode string handling should not be based on single characters at all, but instead should use (sub)strings all over, covering multibyte character representations, ligatures etc. as well This is dog slow. You

Re: [fpc-devel] Re: enumerators

2010-11-16 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: > > First you would have to come up with a workable model for s[x] being > > utf32chars in general that doesn't suffer from O(N^2) performance > > degradation (read/write) > > Right, UTF-32 or UCS2 were much more useful in computations. I said s

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Hans-Peter Diettrich
Alexander Klenin schrieb: The total order will be something between O(n^1) and O(n^2), depending on many factors (what is "n"?...). Huh? O(f(n)) has a precise definition, and of course we are talking worst-case complexity here (although average complexity would be the same in this case). n is

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: First you would have to come up with a workable model for s[x] being utf32chars in general that doesn't suffer from O(N^2) performance degradation (read/write) Right, UTF-32 or UCS2 were much more useful in computations. And for it to be useful, it must be workabl

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Alexander Klenin
On Tue, Nov 16, 2010 at 01:50, Hans-Peter Diettrich wrote: >> The other of the algorithm is then still O(n^2), since UTF8Char will >> already be O(n)? > > The total order will be something between O(n^1) and O(n^2), depending on > many factors (what is "n"?...). Huh? O(f(n)) has a precise definit

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Felipe Monteiro de Carvalho
On Mon, Nov 15, 2010 at 11:21 AM, Michael Schnell wrote: > ..forces the programmer to work with both UTF-8 and UCS32 coded Unicode > characters. This might blow his mind even more (regarding that e.g. the > Lazarus LCL forces him to work with UTF-8 coded Unicode in a string type > called "ANSIStri

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: > >> At least the example code has to be made work, i.e. the nonsense statement > >>DoSomething(ch(i)); > >> has to be changed into something like > >>DoSomething(GetUTF8char(s,i)); > >> before we can can talk honestly about the order of t

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: At least the example code has to be made work, i.e. the nonsense statement DoSomething(ch(i)); has to be changed into something like DoSomething(GetUTF8char(s,i)); before we can can talk honestly about the order of the loop. The other of the algorithm is then

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Thaddy
On 15-11-2010 10:22, Vincent Snijders wrote: Maybe I did not understand Thaddy, but to give you O(1) access to the ith character, I was thinking about a a translation table of the utf8 string, with key=index (1..length) and value=offset in bytes to the ith character. Such a translation table wou

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Marco van de Voort
In our previous episode, Michael Schnell said: > > No, since that wouldn't describe the position of that char in the string > > that is being iterated. > Is this really wanted ? > > I suppose this would ask for a full blown iterator Typically I'd iterate by means outside the language (I've us

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Michael Schnell
On 11/15/2010 11:40 AM, Marco van de Voort wrote: No, since that wouldn't describe the position of that char in the string that is being iterated. Is this really wanted ? I suppose this would ask for a full blown iterator -Michael ___ fpc-devel

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Marco van de Voort
In our previous episode, Michael Schnell said: > > The comparison in the UTF-8 string example is very questionable. First > > ch(i) is not equivalent to ch, not even closely related, and the claim > > of O(N^2) operations deserves an proof - IMO it's simply wrong. > > With UTF-8 strings and frie

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Michael Schnell
On 11/15/2010 11:20 AM, Vincent Snijders wrote: I agree, and that is why you need enumerators to make it work. OK, in fact this _is_ an implementation of an enumerator, but same is hidden and so the application programmer is not forced to bother. He just sees the Unicode character in the loop

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Michael Schnell
unintentionally deleted; ..forces the programmer to work with both UTF-8 and UCS32 coded Unicode characters. This might blow his mind even more (regarding that e.g. the Lazarus LCL forces him to work with UTF-8 coded Unicode in a string type called "ANSIString" :( ) -Michael

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Vincent Snijders
2010/11/15 Michael Schnell : > On 11/15/2010 10:22 AM, Vincent Snijders wrote: >> >> I cannot imagine another way that a translations table can give you o(1) >> access. >> > Maybe I don't understand the o(1) correctly. Do you think it should be > necessary to access each character in the string wit

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Michael Schnell
On 11/15/2010 10:10 AM, Alexander Klenin wrote: Actually, I do not think so. I believe that an integer containing the codepoint is preferable implementation. OK, Unicode always blows up the complexity of the code greatly ;). Your suggestion would result in an UTF-8 -> UCS32 translation and thu

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Michael Schnell
On 11/15/2010 10:22 AM, Vincent Snijders wrote: I cannot imagine another way that a translations table can give you o(1) access. Maybe I don't understand the o(1) correctly. Do you think it should be necessary to access each character in the string with in each iteration in this way. What I

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Vincent Snijders
2010/11/15 Michael Schnell : > On 11/14/2010 03:33 PM, Vincent Snijders wrote: >> >> I did not have in mind such a sophisticated UTF8 string >> implementation, that included a translation table for easy indexing. > > I don't think you need a translation table to walk through an UTF-8 String Maybe

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Alexander Klenin
On Mon, Nov 15, 2010 at 18:38, Michael Schnell wrote: > On 11/13/2010 08:56 PM, Hans-Peter Diettrich wrote: >> >> >> The comparison in the UTF-8 string example is very questionable. First >> ch(i) is not equivalent to ch, not even closely related, and the claim of >> O(N^2) operations deserves an

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Michael Schnell
On 11/14/2010 10:12 PM, Hans-Peter Diettrich wrote: With regards to UTF-8 (or other MBCS) strings, what does Length(s) return in these cases? IMO other functions have to be used for the determination of the true character count (as opposed to the char=byte count). Of course its possible without

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Michael Schnell
On 11/14/2010 03:33 PM, Vincent Snijders wrote: I did not have in mind such a sophisticated UTF8 string implementation, that included a translation table for easy indexing. I don't think you need a translation table to walk through an UTF-8 String Unicode-Character by Unicode-Character (and cre

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Michael Schnell
On 11/14/2010 09:47 PM, Hans-Peter Diettrich wrote: I wonder how FPC defines low() and high() for sets. IMHO it should not. An "in" loop on sets should not use a defined sequence. Relying on on an "order" of the elements of a set mathematically is erroneous. -Michael __

Re: [fpc-devel] Re: enumerators

2010-11-15 Thread Michael Schnell
On 11/13/2010 08:56 PM, Hans-Peter Diettrich wrote: The comparison in the UTF-8 string example is very questionable. First ch(i) is not equivalent to ch, not even closely related, and the claim of O(N^2) operations deserves an proof - IMO it's simply wrong. With UTF-8 strings and friends wo

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Alexander Klenin
On Mon, Nov 15, 2010 at 08:25, Marco van de Voort wrote: > In our previous episode, Hans-Peter Diettrich said: >> At least the example code has to be made work, i.e. the nonsense statement >>    DoSomething(ch(i)); >> has to be changed into something like >>    DoSomething(GetUTF8char(s,i)); >> be

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Alexander Klenin
On Sun, Nov 14, 2010 at 08:52, Graeme Geldenhuys wrote: > If you use full-blown Iterator classes (instead of just for-in style) > you get a lot more too: > >  * full control over iteration >   - move forward >   - move back >   - reset iteration >   - peek forward/back >   - skip, etc... >  * you

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: > > the O(N^2) stems from the fact that it is hard to get the ith > > character in a a UTF8String in O(1). Suppose it is o(N), then the loop > > is O(n^2). > > With regards to UTF-8 (or other MBCS) strings, what does Length(s) The base size of

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: > > > A more grave reason though is that Delphi does not have low() and high() on > > sets and a request to add it by me in 2006 was closed with their equivalent > > of "won't fix". > > I wonder how FPC defines low() and high() for sets. See th

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Hans-Peter Diettrich
Vincent Snijders schrieb: 2010/11/14 Thaddy : On 13-11-2010 20:56, Hans-Peter Diettrich wrote: The comparison in the UTF-8 string example is very questionable. First ch(i) is not equivalent to ch, not even closely related, and the claim of O(N^2) operations deserves an proof - IMO it's simply w

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: A more grave reason though is that Delphi does not have low() and high() on sets and a request to add it by me in 2006 was closed with their equivalent of "won't fix". I wonder how FPC defines low() and high() for sets. The static bounds can be obtained from the un

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Hans-Peter Diettrich
Thaddy schrieb: The comparison in the UTF-8 string example is very questionable. First ch(i) is not equivalent to ch, not even closely related, and the claim of O(N^2) operations deserves an proof - IMO it's simply wrong. Yes, this caught my eye as well: O(N^2) seems only the case if "length"

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Marco van de Voort
In our previous episode, Thaddy said: > > would be evaluated every time. S > > the O(N^2) stems from the fact that it is hard to get the ith > > character in a a UTF8String in O(1). Suppose it is o(N), then the loop > > is O(n^2). > > > "Hard to" is implementation detail and not part of any algorit

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Vincent Snijders
2010/11/14 Thaddy : > On 14-11-2010 13:22, Vincent Snijders wrote: >> >> would be evaluated every time. S >> the O(N^2) stems from the fact that it is hard to get the ith >> character in a a UTF8String in O(1). Suppose it is o(N), then the loop >> is O(n^2). >> >> Vincent > > "Hard to" is implement

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Thaddy
On 14-11-2010 13:22, Vincent Snijders wrote: would be evaluated every time. S the O(N^2) stems from the fact that it is hard to get the ith character in a a UTF8String in O(1). Suppose it is o(N), then the loop is O(n^2). Vincent "Hard to" is implementation detail and not part of any algorithm.

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Vincent Snijders
2010/11/14 Thaddy : > On 13-11-2010 20:56, Hans-Peter Diettrich wrote: >> >> The comparison in the UTF-8 string example is very questionable. First >> ch(i) is not equivalent to ch, not even closely related, and the claim of >> O(N^2) operations deserves an proof - IMO it's simply wrong. >> > Yes,

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Marco van de Voort
In our previous episode, Thaddy said: > > The comparison in the UTF-8 string example is very questionable. First > > ch(i) is not equivalent to ch, not even closely related, and the claim > > of O(N^2) operations deserves an proof - IMO it's simply wrong. > > > Yes, this caught my eye as well: O(

Re: [fpc-devel] Re: enumerators

2010-11-14 Thread Thaddy
On 13-11-2010 20:56, Hans-Peter Diettrich wrote: The comparison in the UTF-8 string example is very questionable. First ch(i) is not equivalent to ch, not even closely related, and the claim of O(N^2) operations deserves an proof - IMO it's simply wrong. Yes, this caught my eye as well: O(N^

Re: [fpc-devel] Re: enumerators

2010-11-13 Thread Graeme Geldenhuys
On 13 November 2010 23:32, Sven Barth wrote: > On 13.11.2010 20:56, Hans-Peter Diettrich wrote: >> >> In general, what's the benefit of using enumerators? IMO a for loop >> executes faster on (linear) string and array types, where enumerator >> calls occur in for-in (see also my note on the UTF-8

Re: [fpc-devel] Re: enumerators

2010-11-13 Thread Sven Barth
On 13.11.2010 20:56, Hans-Peter Diettrich wrote: In general, what's the benefit of using enumerators? IMO a for loop executes faster on (linear) string and array types, where enumerator calls occur in for-in (see also my note on the UTF-8 string example). I'd say they simplify the code. They mi

[fpc-devel] Re: enumerators (was: Free Pascal 2.4.2 released!)

2010-11-13 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: we have placed a new major release of the Free Pascal Compiler, version 2.4.2 on our ftp-servers. Great :-) Some highlights are: Compiler: * Support D2006+ FOR..IN, with some FPC specific enhancements. Refer to http://wiki.freepascal.org/for-in_loop for m