On 12/03/2014 05:02 AM, Hans-Peter Diettrich wrote:
Michael Schnell schrieb:
- It does not result in additional conversions.
It does, e.g. in searching or sorting of StringList, when it can contain
strings of different encodings. The choice of a unique encoding for
application strings (maybe
On 12/03/2014 12:52 AM, Hans-Peter Diettrich wrote:
You forget that Jonas refers to *dynamic* string encodings, unknown at
compile time.
???
In you other mail you pointed out that fpc (other than Delphi) does not
provide *dynamic* string encoding with RawByteString (and where else
would it
On 12/03/2014 12:52 AM, Hans-Peter Diettrich wrote:
In Delphi *no* string can have an dynamic encoding of CP_NONE or CP_ACP,
If you really do have Dynamic strings, obviously, the *definition*
(i.e. CP_...) of such strings is strictly static (just for compiler use)
and never cant be used as
On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote:
You suggested to use string as UTF-16 on Windows, and UTF-8 on
Linux. That's what I understand as a unique program-wide string
representation (not sourcecode-wide, instead program as *compiled*).
Then I cannot see any need or use for
On 12/02/2014 01:05 PM, Michael Schnell wrote:
But why do you say would be appreciated ? Is it not possible to use
RawByteString in a way the name suggests, by never bringing it
together with any String variable of a different encoding brand and
hence avoid any conversion - be same
On 11/29/2014 07:55 AM, Jonas Maebe wrote:
Exactly the same goes for converting strings with code page CP_NONE to
a different code page: your program is broken when it tries to do that,
While accessing an array beyond its bounds is not detectable at compile
time and accessing an array beyond
Michael Schnell schrieb:
On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote:
Apart from that, every encoding-tolerant code will execute much slower
than code without a need for checks and conversions everywhere.
As I pointed out I don't agree at all.
- The check is only two ASM instructions
Jonas Maebe schrieb:
On 28/11/14 21:30, Hans-Peter Diettrich wrote:
I prefer to specify and document everything *before* coding, so that
everybody can expect that the code will behave as specified.
If certain behaviour is explicitly undefined, it *is* specified and
documented. It means that
On 27 Nov 2014, at 17:11, Hans-Peter Diettrich drdiettri...@aol.com wrote:
Such statements come only from writers that do not believe that their words
can be understood in various ways ;-)
I'm sorry, but I simply cannot discuss with people that, when I literally state
the result is
On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote:
The universal paradigm would allow for extensions (e.g. UTF-32,
multiple 16 Bit Code pages, an additional fully dynamic String type,
n-byte un-encoded string types), as I described in the Wiki page.
Even if feasable, such arbitrary string
On 11/27/2014 07:29 PM, Hans-Peter Diettrich wrote:
Michael Schnell schrieb:
E.g. there are (are least two Code pages for UTF-16 (LE, and
BE), that would be worth supporting.
You are confusing codepages and encodings :-(
That is why I put goose-feet around Code pages. I used this wording
Jonas Maebe schrieb:
I'm sorry, but I simply cannot discuss with people that, when I
literally state the result is undefined, think that I may actually
have meant the result is defined and if you change the
implementation and/or keep it stable across compiler releases, then
it will also conform
Michael Schnell schrieb:
I fear that there will be code that relies on the flawed behavior of
RawByteString (it's a feature, not a bug) and using the same name with
different behavior would brake same. And a really usable DynmicString
would not adhere to that description.
How can somebody
Michael Schnell schrieb:
On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote:
An *efficient* implementation would be based on a single program-wide
string representation, with different encodings being handled only in
an exchange with external data sources.
Yep. But it would result in severe
On 28/11/14 21:30, Hans-Peter Diettrich wrote:
I prefer to specify and document everything *before* coding, so that
everybody can expect that the code will behave as specified.
If certain behaviour is explicitly undefined, it *is* specified and
documented. It means that your program is buggy if
In our previous episode, Hans-Peter Diettrich said:
concatenated without data loss and that the result is then converted to
the target string's encoding (except in case the target is
RawByteString). How that is implemented exactly is undefined; again in
the meaning of undefined, not in the
On 11/26/2014 05:25 PM, Sven Barth wrote:
So seemingly you could do MyStringType = type
AnsiString(CP_UTF16), and seemingly the size information is set
according to this.
No, you can't, because the RTL does not handle that. For AnsiString
the element size is *always* 1. It's
On 11/26/2014 05:37 PM, Jonas Maebe wrote:
invalid (in the meaning of undefined) in both FPC and Delphi.
Sorry (I am not a native speaker). But to me undefined and invalid
have completely different meanings (in this context). An Invalid use
of the language would result in an error (compiler
On 11/26/2014 09:30 PM, Hans-Peter Diettrich wrote:
So seemingly you could do MyStringType = type
AnsiString(CP_UTF16), and seemingly the size information is set
according to this.
Not in Delphi XE.
Thanks for the clarification.
I did have some hope that fpc would be (or could be
On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote:
Not all codepages have a fixed number of bytes per character.
The string preamble contains the *element size* (1 for AnsiString),
just like with every dynamic array.
Sorry for sloppy wording. Of course I did mean element size
(Character here
On 26/11/14 23:41, Hans-Peter Diettrich wrote:
In this case the implementation is compiler specific, somewhat
different from undefined (in a RawByteString):
CP_NONE: this value indicates that no code page information has been
associated with the string data. The result of any explicit or
Michael Schnell schrieb:
I now understand that the Element Size field in the String header is
quite dummy, as under the hood there are two completely separate
concepts for one-byte-Strings and 2-Byte Strings and none for other
Element sizes.
After a code review I realized that the element
Michael Schnell schrieb:
On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote:
Not all codepages have a fixed number of bytes per character.
The string preamble contains the *element size* (1 for AnsiString),
just like with every dynamic array.
Sorry for sloppy wording. Of course I did mean
Jonas Maebe schrieb:
On 26/11/14 23:41, Hans-Peter Diettrich wrote:
In this case the implementation is compiler specific, somewhat
different from undefined (in a RawByteString):
CP_NONE: this value indicates that no code page information has been
associated with the string data. The result of
Michael Schnell schrieb:
On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote:
An AnsiString consists of AnsiChar's. The *meaning* of these char's
(bytes) depends on their encoding, regardless of whether the used
encoding is or is not stored with the string.
I understand that the
I fail to understand some of the text.
It seems to be unavoidable to use the name ANSIString even though I
always though up when seeing a thing called ANSI containing Unicode
(e. g. UTF8String = type AnsiString(CP_UTF8) ).
Seemingly here the bytes per character setting implicitly is
On Wed, 26 Nov 2014 11:23:17 +0100
Michael Schnell mschn...@lumino.de wrote:
[...]
It seems to be unavoidable to use the name ANSIString even though I
always though up when seeing a thing called ANSI containing Unicode
(e. g. UTF8String = type AnsiString(CP_UTF8) ).
Is there a question?
On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
Ansistring supports only one byte per character code pages.
Even more confused. Am I wrong thinking that with code aware Strings,
for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not
right, than due later) ?
What is a
Am 26.11.2014 11:53 schrieb Michael Schnell mschn...@lumino.de:
On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
Ansistring supports only one byte per character code pages.
Even more confused. Am I wrong thinking that with code aware Strings,
for Delphi XE compatibility, in Windows CP_ACP
On Wed, 26 Nov 2014 11:23:17 +0100
Michael Schnell mschn...@lumino.de wrote:
[...]
2) I fail to understand how with this explanation that seems to force
auto conversion for assignments between types with different code page
settings (also for CP_ACP) the static code page can differ from the
On Wed, 26 Nov 2014 11:52:50 +0100
Michael Schnell mschn...@lumino.de wrote:
On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
Ansistring supports only one byte per character code pages.
Even more confused. Am I wrong thinking that with code aware Strings,
for Delphi XE compatibility, in
On 11/26/2014 12:09 PM, Sven Barth wrote:
In Delphi (and FPC) CP_ACP corresponds by default with the current
system codepage (e.g. CP1252 on a German Windows).
OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
String(CP1252) but different from String without brackets which in
On 11/26/2014 12:13 PM, Mattias Gaertner wrote:
In mode delphiunicode String=UnicodeString.
I see.
So even in Delphi XE where UnicodeString is denoted by CP_UTF16, the
value of the constant CP_UTF16 is not the same as the value of the
(constant or) variable CP_ACP, (while OTOH using the
On 11/26/2014 12:10 PM, Mattias Gaertner wrote:
the results of conversions from/to the CP_NONE code page are undefined.
... because CP_NONE is not a real code page.
So you understand result as what you would get when printing.
In the context of this wiki page I would understand result as the
After re-reading yet another question:
In section String concatenations there is no mentioning about
auto-conversion. For statically typed Strings it's rather obvious that
they will be auto-converted if appropriate. Technically - if differently
encode - they seem to be converted to Unicode
Am 26.11.2014 12:37 schrieb Michael Schnell mschn...@lumino.de:
On 11/26/2014 12:09 PM, Sven Barth wrote:
In Delphi (and FPC) CP_ACP corresponds by default with the current
system codepage (e.g. CP1252 on a German Windows).
OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
On 26/11/14 12:53, Michael Schnell wrote:
[CP_NONE]
Is this undefined in the meaning of not predictable by the user in
the current version of fpc, or in the meaning of due to change when
updating fpc.
This undefined literally means undefined. It does not mean
undefined in a meaning that is
On 11/26/2014 03:05 PM, Sven Barth wrote:
OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
String(CP1252) but different from String without brackets which in
turn is the same as String(CP_UTF16) ? Correct ?
There is no String with brackets. You can only use AnsiString
On 26/11/14 13:11, Michael Schnell wrote:
In section String concatenations there is no mentioning about
auto-conversion.
There is.
For statically typed Strings it's rather obvious that
they will be auto-converted if appropriate.
It's probably rather obvious because it is literally mentioned
Am 26.11.2014 15:30 schrieb Mattias Gaertner nc-gaert...@netcologne.de:
On Wed, 26 Nov 2014 15:05:16 +0100
Sven Barth pascaldra...@googlemail.com wrote:
[...]
While both AnsiString and UnicodeString have the current codepage and
the
character size in their header record
AFAIK
On 26/11/14 17:21, Sven Barth wrote:
Yes, nevertheless the header record is the same for UnicodeString and
AnsiString and thus it also has a codepage field which is always
initialized to CP_UTF16 however.
It can also be CP_UTF16BE (which it is on big endian FPC targets right now).
Jonas
On 26/11/14 16:19, Michael Schnell wrote:
So seemingly you could do MyStringType = type
AnsiString(CP_UTF16), and seemingly the size information is set
according to this.
As several people have told you several times, that is invalid (in the
meaning of undefined) in both FPC and Delphi.
On Wed, 26 Nov 2014 17:23:48 +0100
Jonas Maebe jonas.ma...@elis.ugent.be wrote:
On 26/11/14 17:21, Sven Barth wrote:
Yes, nevertheless the header record is the same for UnicodeString and
AnsiString and thus it also has a codepage field which is always
initialized to CP_UTF16 however.
It
On Wed, 26 Nov 2014 17:50:31 +0100
Mattias Gaertner nc-gaert...@netcologne.de wrote:
On Wed, 26 Nov 2014 17:23:48 +0100
Jonas Maebe jonas.ma...@elis.ugent.be wrote:
On 26/11/14 17:21, Sven Barth wrote:
Yes, nevertheless the header record is the same for UnicodeString and
AnsiString
Mattias Gaertner schrieb:
On Wed, 26 Nov 2014 11:23:17 +0100
Michael Schnell mschn...@lumino.de wrote:
Seemingly here the bytes per character setting implicitly is thought
of as a port of the code-page definition. correct ?
Code page define bytes per character.
Huh?
Not all codepages
Michael Schnell schrieb:
On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
Ansistring supports only one byte per character code pages.
Even more confused. Am I wrong thinking that with code aware Strings,
for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not
right, than
Michael Schnell schrieb:
On 11/26/2014 12:09 PM, Sven Barth wrote:
In Delphi (and FPC) CP_ACP corresponds by default with the current
system codepage (e.g. CP1252 on a German Windows).
OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
String(CP1252) but different from String
Michael Schnell schrieb:
I fail to understand some of the text.
It seems to be unavoidable to use the name ANSIString even though I
always though up when seeing a thing called ANSI containing Unicode
(e. g. UTF8String = type AnsiString(CP_UTF8) ).
Seemingly here the bytes per character
Mattias Gaertner schrieb:
For example:
CP_ACP=0, DefaultSystemCodePage=1252
That means static code page is always 0, while dynamic code page can be
0 or 1252. Both describe the same encoding.
A *dynamic* encoding *never* can be CP_ACP nor CP_NONE (in Delphi).
These values are allowed only
Michael Schnell schrieb:
So seemingly you could do MyStringType = type
AnsiString(CP_UTF16), and seemingly the size information is set
according to this.
Not in Delphi XE.
DoDi
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
Jonas Maebe schrieb:
Technically, that section literally states that they will be
concatenated without data loss and that the result is then converted to
the target string's encoding (except in case the target is
RawByteString). How that is implemented exactly is undefined; again in
the meaning
On 26.11.2014 19:54, Hans-Peter Diettrich wrote:
UTF-16 is not a valid value for CP_ACP in Delphi, because it's a 2-byte
encoding. Even if the Delphi architects may have thought about an common
string type, with a variable element size (1,2,4), this certainly turned
out soon as a stupid idea, so
52 matches
Mail list logo