Re: [fpc-devel] Unicode proceedings

2011-11-21 Thread Michael Schnell
On 11/18/2011 06:47 PM, Hans-Peter Diettrich wrote: IMO such a separation is of no use. While of course doing a viable (written) definition sometimes is less fun than being "creative" and start implementing what you "feel", very obviously this likely leads to sub-optimal results and fruitless

Re: [fpc-devel] Unicode proceedings

2011-11-21 Thread Michael Schnell
On 11/18/2011 06:47 PM, Hans-Peter Diettrich wrote: IMO such a separation is of no use. If you want an new string type, I don't want anything dedicated, I just want to help to make FPC the best possible compiler. then make it a class, Very obviously strings are not classes (in Pascal) -Mich

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread DaWorm
On Nov 18, 2011 1:14 PM, "Hans-Peter Diettrich" wrote: > > > That's not easily feasable, as long as empty strings are implemented as Nil pointers. When reference counting etc. should be preserved, the additional information had to be moved into an static string descriptor, together with the pointe

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Hans-Peter Diettrich
DaWorm schrieb: Perhaps a little extra compiler magic could be used? If the base definition of a string (the hidden stuff before the data) contains not only a field with the encoding, but a flag indicating the disposition of the encoding, then when a string type is aliases, that disposition co

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Hans-Peter Diettrich
Michael Schnell schrieb: As already said, my request at this time is not considering implementation before agreeing on a decently clear definition of what should be provided and what is supposed top happen when. IMO such a separation is of no use. If you want an new string type, then make it

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Hans-Peter Diettrich
Graeme Geldenhuys schrieb: On 2011-11-18 12:11, Michael Schnell wrote: Why should a type that is capable of holding multiple different UTF encodings be called "ANSIString". IMHO this is very contra-intuitive. Every time I see this used in Delphi too, I start to laugh as well. It makes no sense

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Sven Barth
Seems like the message you quote here went to you personally as well (that would explain why you sent this answer to me directly first...) Thus here the original mail I wroted === original mail begin === Am 18.11.2011 10:22, schrieb Michael Schnell: > On 11/17/2011 02:44 PM, Sven Barth wrote:

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Felipe Monteiro de Carvalho
On Fri, Nov 18, 2011 at 2:36 PM, Michael Schnell wrote: >> And now very recently I found out that this is no longer valid in 2.7, >> ansistring can be configured to hold a UTF-8 value in a valid and supported >> way, and this changes a lot of things to me. > > To the worse I gather. Quite the con

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Michael Schnell
On 11/18/2011 01:37 PM, Sven Barth wrote: Because then you don't need to rely on the point that SizeOf(Char) = 1. Now imagine you have an applications that uses strings as buffers and port that from lets say Delphi 7 to Delphi 2009. Have fun finding the bugs if you don't remember that you used a

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Michael Schnell
On 11/18/2011 01:48 PM, Felipe Monteiro de Carvalho wrote: And now very recently I found out that this is no longer valid in 2.7, ansistring can be configured to hold a UTF-8 value in a valid and supported way, and this changes a lot of things to me. To the worse I gather. -Michael __

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Michael Schnell
On 11/18/2011 01:41 PM, Sven Barth wrote: This could indeed have been named better. But there are other examples like this: I still can't remember which of SmallInt and Short is the 1 Byte and the 2 Byte variant. Some type names like "Signed8" and "Unsigned16" would simplyfy this... but I won

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Michael Schnell
As already said, my request at this time is not considering implementation before agreeing on a decently clear definition of what should be provided and what is supposed top happen when. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.o

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Sven Barth
Am 18.11.2011 14:09, schrieb Marco van de Voort: In our previous episode, Graeme Geldenhuys said: like this: I still can't remember which of SmallInt and Short is the 1 Byte and the 2 Byte variant. Some type names like "Signed8" and "Unsigned16" would simplyfy this... but I won't go more into th

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said: > > like this: I still can't remember which of SmallInt and Short is the 1 > > Byte and the 2 Byte variant. Some type names like "Signed8" and > > "Unsigned16" would simplyfy this... but I won't go more into that > > direction now ^^ > > For exactly

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Graeme Geldenhuys
On 2011-11-18 14:41, Sven Barth wrote: > like this: I still can't remember which of SmallInt and Short is the 1 > Byte and the 2 Byte variant. Some type names like "Signed8" and > "Unsigned16" would simplyfy this... but I won't go more into that > direction now ^^ For exactly the same reason I hav

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Felipe Monteiro de Carvalho
On Fri, Nov 18, 2011 at 11:11 AM, Michael Schnell wrote: > Why should a type that is capable of holding multiple different UTF > encodings be called "ANSIString". IMHO this is very contra-intuitive. Yes, I have to agree here. It seams that my understanding in the Unicode discussions was plagued b

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread DaWorm
Perhaps a little extra compiler magic could be used? If the base definition of a string (the hidden stuff before the data) contains not only a field with the encoding, but a flag indicating the disposition of the encoding, then when a string type is aliases, that disposition could be overridden.

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Sven Barth
Am 18.11.2011 11:11, schrieb Michael Schnell: In theory the AnsiString type (which is now the code page aware string type) should be capable of holding UTF-8 and UTF-16 data, Why should a type that is capable of holding multiple different UTF encodings be called "ANSIString". IMHO this is very c

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Michael Schnell
On 11/18/2011 11:21 AM, Graeme Geldenhuys wrote: Can't we just have a single damn string type like Java and some other languages. Lets just call it...ummm String! ;-) This has been discussed at any length here and in many other forums. This is what I tried to describe in (B). It has been t

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Graeme Geldenhuys
On 2011-11-18 12:11, Michael Schnell wrote: > Why should a type that is capable of holding multiple different UTF > encodings be called "ANSIString". IMHO this is very contra-intuitive. Every time I see this used in Delphi too, I start to laugh as well. It makes no sense. Call the damn thing Unico

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Michael Schnell
On 11/17/2011 03:01 PM, Marco van de Voort wrote: The ansistring and unicodestring types have the same memory layout except for the character data (iow the record before the character data is the same). My intention with starting this Thread was not to discuss any implementation details, but to

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Michael Schnell
On 11/17/2011 02:55 PM, Sven Barth wrote: Am 17.11.2011 12:59, schrieb Michael Schnell: Note that the Delphi2009 definition is theoretically capable of combining one and two bytes in one type (like Yury's). As I don't have such a Delphi please help me to understand: Is there a general type ded

Re: [fpc-devel] Unicode proceedings

2011-11-18 Thread Michael Schnell
On 11/17/2011 02:44 PM, Sven Barth wrote: One could implement a similar type for something like this (maybe even use the mentioned TBytes) and define operator overloads for it (at least for "+"). Why should one do this, regarding that a normal string type provides exactly what very often is

Re: [fpc-devel] Unicode proceedings

2011-11-17 Thread Marco van de Voort
In our previous episode, Sven Barth said: > > Is there a general type dedicated for being able to hold any encoding ? > > (be it ANSIxyz, UTF-8 or UTF-16) ? > > In theory the AnsiString type (which is now the code page aware string > type) should be capable of holding UTF-8 and UTF-16 data, but e

Re: [fpc-devel] Unicode proceedings

2011-11-17 Thread Sven Barth
Am 17.11.2011 12:59, schrieb Michael Schnell: Note that the Delphi2009 definition is theoretically capable of combining one and two bytes in one type (like Yury's). As I don't have such a Delphi please help me to understand: Is there a general type dedicated for being able to hold any encoding

Re: [fpc-devel] Unicode proceedings

2011-11-17 Thread Sven Barth
Am 17.11.2011 10:04, schrieb Luca Olivetti: Al 17/11/2011 2:15, En/na Hans-Peter Diettrich ha escrit: Abusing strings for binary data is a bad idea. I use strings extensively as buffers: strings in delphi/fpc {$H+} are so convenient to use that the eventual performance hit doesn't matter to m

Re: [fpc-devel] Unicode proceedings

2011-11-17 Thread Michael Schnell
On 11/17/2011 02:43 AM, Hans-Peter Diettrich wrote: The only possible (expression) optimization again is based on UTF-16, where all sub-expressions are converted into UTF-16, so that only one more re-conversion is required when the result is stored. This is what mse does: using UTF-16 for the

Re: [fpc-devel] Unicode proceedings

2011-11-17 Thread Michael Schnell
On 11/16/2011 05:24 PM, Marco van de Voort wrote: The original proposal was like (A) but only for base unicode encodings (utf8/16 and maybe 32), but went down due to either excess conversions and need for overloading. The amount of overloading for the current 3-4 stringtypes is already a bit mu

Re: [fpc-devel] Unicode proceedings

2011-11-17 Thread Michael Schnell
On 11/17/2011 02:15 AM, Hans-Peter Diettrich wrote: Right, you continue to provide suggestions that only result in slow code at runtime :-( I did not say _anything_ about any kind of implementation, nor did I suggest that any of the alternative "suggested variants of definitions" is preferab

Re: [fpc-devel] Unicode proceedings

2011-11-17 Thread Luca Olivetti
Al 17/11/2011 2:15, En/na Hans-Peter Diettrich ha escrit: Abusing strings for binary data is a bad idea. I use strings extensively as buffers: strings in delphi/fpc {$H+} are so convenient to use that the eventual performance hit doesn't matter to me. You know, I like the fact that I can sim

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Hans-Peter Diettrich
Marco van de Voort schrieb: Note that the Delphi2009 definition is theoretically capable of combining one and two bytes in one type (like Yury's). Afaik there is no consensus why Embarcadero kept the two types separate, though I can think of several reasons: - performance - backwards compatibi

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/16/2011 02:56 PM, Hans-Peter Diettrich wrote: Delphi uses the native/generic AnsiString(0), A native /generic type is exactly what is _not_ available in the A) suggestion of definitions. Here a type name stands for exactly one encoding variant. No dynamic encodin

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Marco van de Voort
In our previous episode, Michael Schnell said: > > Then there were fully dynamic encoding schemes proposed too (e..g by > > Florian). > I do know this. There seems pros and cons have been discussed, but no > real decision has been done (i.e. a strict independent understandable > definition of wha

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Michael Schnell
On 11/16/2011 02:56 PM, Hans-Peter Diettrich wrote: Delphi uses the native/generic AnsiString(0), A native /generic type is exactly what is _not_ available in the A) suggestion of definitions. Here a type name stands for exactly one encoding variant. No dynamic encoding (implemented by having a

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Hans-Peter Diettrich
Michael Schnell schrieb: Obviously a system of hard coded string types (such as A) is not what everybody (but some) wants (e.g. as there would need a lot of such types and because EMB decided implementing dynamic typing). It's the best system, performance-wise. An application should not use

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Michael Schnell
On 11/16/2011 11:36 AM, Marco van de Voort wrote: Then there were fully dynamic encoding schemes proposed too (e..g by Florian). I do know this. There seems pros and cons have been discussed, but no real decision has been done (i.e. a strict independent understandable definition of what is sup

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Marco van de Voort
In our previous episode, Michael Schnell said: [ Charset ISO-8859-1 unsupported, converting... ] > On 11/16/2011 10:44 AM, Marco van de Voort wrote: > > > > It's a mix of all three actually. It is typed(A), there are two (B) > > implementations, and the memory layout of both implementations have >

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Michael Schnell
On 11/16/2011 10:44 AM, Marco van de Voort wrote: It's a mix of all three actually. It is typed(A), there are two (B) implementations, and the memory layout of both implementations have similarities that makes Rawbytestring possible, making rawbytestring the base memory representation the others

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Sven Barth
Am 16.11.2011 10:44, schrieb Marco van de Voort: In our previous episode, Sven Barth said: On 15.11.2011 12:41, Michael Schnell wrote: While neither A nor B is Delphi XE compatible in any way, C seems a bit similar to what Emb does. But AFAIK, Delphi does not provide an unambiguous, well define

Re: [fpc-devel] Unicode proceedings

2011-11-16 Thread Marco van de Voort
In our previous episode, Sven Barth said: > On 15.11.2011 12:41, Michael Schnell wrote: > > While neither A nor B is Delphi XE compatible in any way, C seems a bit > > similar to what Emb does. But AFAIK, Delphi does not provide an > > unambiguous, well defined and understandable paradigm (such as

Re: [fpc-devel] Unicode proceedings

2011-11-15 Thread Sven Barth
On 15.11.2011 12:41, Michael Schnell wrote: While neither A nor B is Delphi XE compatible in any way, C seems a bit similar to what Emb does. But AFAIK, Delphi does not provide an unambiguous, well defined and understandable paradigm (such as a Object-like Parent/Child relationship) for the featu

[fpc-devel] Unicode proceedings

2011-11-15 Thread Michael Schnell
Here, there have been lots of long winding and partly quite fruitless discussions on the implementation of the new Unicode aware string type(s). IMHO, before trying to decide regarding any implementation details, there should be a _very_explicit_ decision on the general functionality. Here, I