Re: [Lazarus] UTF8 RTL for Windows

2014-11-26 Thread Michael Schnell
On 11/25/2014 09:39 PM, Hans-Peter Diettrich wrote: The Delphi model already broke that claimed type safety, by omitting conversions of RawByteString results, for speed optimization. That's dangerous, because the compiler can *only* check the static type of string variables, but not the dynam

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: On Tue, 25 Nov 2014 14:49:52 +0100 Felipe Monteiro de Carvalho wrote: On Tue, Nov 25, 2014 at 2:45 PM, Mattias Gaertner wrote: Retype "Char" to "String" and the compiler will bark. For example in Graphics. What about changing to WideChar then? If you mean unit

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: On Tue, 25 Nov 2014 13:10:26 +0100 Hans-Peter Diettrich wrote: [...] Maybe I don't understand the question, but it seems to me this is documented where static-, dynamic cp and rawbytestring are explained. More concrete questions: How can a user be sure that a strin

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: On Tue, 25 Nov 2014 11:53:00 +0100 Hans-Peter Diettrich wrote: [...] Correction: *This* Char type needs to be extended. Please specify. The ThousandSeparator type is "Char", which does not work with Russian in UTF-8. Well, at least if you want the non breakable sp

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Frederic Da Vitoria
2014-11-25 14:45 GMT+01:00 Mattias Gaertner : > On Tue, 25 Nov 2014 11:53:00 +0100 > Hans-Peter Diettrich wrote: > > >[...] > > > Correction: *This* Char type needs to be extended. > > > > Please specify. > > The ThousandSeparator type is "Char", which does not work with > Russian in UTF-8. Well,

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Felipe Monteiro de Carvalho
On Tue, Nov 25, 2014 at 3:14 PM, Mattias Gaertner wrote: >> What about changing to WideChar then? > > If you mean unit Graphics: It checks for ASCII characters. So a change > to WideChar would add implicit conversions without any gain. > > In case of ThousandSeparator: > That would probably be suf

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Mattias Gaertner
On Tue, 25 Nov 2014 14:49:52 +0100 Felipe Monteiro de Carvalho wrote: > On Tue, Nov 25, 2014 at 2:45 PM, Mattias Gaertner > wrote: > > Retype "Char" to "String" and the compiler will bark. For example in > > Graphics. > > What about changing to WideChar then? If you mean unit Graphics: It chec

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Felipe Monteiro de Carvalho
On Tue, Nov 25, 2014 at 2:45 PM, Mattias Gaertner wrote: > Retype "Char" to "String" and the compiler will bark. For example in > Graphics. What about changing to WideChar then? -- Felipe Monteiro de Carvalho -- ___ Lazarus mailing list Lazarus@lists

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Mattias Gaertner
On Tue, 25 Nov 2014 11:53:00 +0100 Hans-Peter Diettrich wrote: >[...] > > Correction: *This* Char type needs to be extended. > > Please specify. The ThousandSeparator type is "Char", which does not work with Russian in UTF-8. Well, at least if you want the non breakable space instead of the nor

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Mattias Gaertner
On Tue, 25 Nov 2014 13:10:26 +0100 Hans-Peter Diettrich wrote: >[...] > > Maybe I don't understand the question, but it seems to me this is > > documented where static-, dynamic cp and rawbytestring are explained. > > More concrete questions: > > How can a user be sure that a string parameter i

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: On Mon, 24 Nov 2014 22:15:29 +0100 Hans-Peter Diettrich wrote: [...] The Delphi (and FPC) encoding model allows for strings of different static (declared) and dynamic (true content) encoding, see the special handling of RawByteString (Wiki). So far it's not a good

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: On Mon, 24 Nov 2014 22:53:44 +0100 Hans-Peter Diettrich wrote: Graeme Geldenhuys schrieb: How is ThousandSeparator and DecimalSeparator supposed to work it TFormatSettings? If you switched the RTL to UTF-8 or UTF-16 a Russian thousand separator (4-byte non-breaking

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Michael Schnell
On 11/24/2014 10:15 PM, Hans-Peter Diettrich wrote: I'm missing documentation for working safely (and efficiently) with such irregular strings, most probably none of the FPC (and Delphi) developers ever noticed how users are left alone with this problem :-( Hmm. In the fpc-devel, lazarus-de

Re: [Lazarus] UTF8 RTL for Windows

2014-11-25 Thread Graeme Geldenhuys
On 2014-11-24 23:13, Mattias Gaertner wrote: > In case of the new LCL mode we can extend the "LCL Unicode support" page. I don't know if that is the correct place though. The "not implemented yet" features affect other toolkits, console and web applications too, not just LCL based ones. So for no

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 16:40:06 + Graeme Geldenhuys wrote: >[...] > Where should we report this? Mantis or Unicode page of the Wiki? On a second thought, a programmer need to know what might fail and the alternative/workaround. The latter depends on settings. In case of the new LCL mode we can

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 22:53:44 +0100 Hans-Peter Diettrich wrote: > Graeme Geldenhuys schrieb: > > > How is ThousandSeparator and DecimalSeparator supposed to work it > > TFormatSettings? If you switched the RTL to UTF-8 or UTF-16 a Russian > > thousand separator (4-byte non-breaking white space ch

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 22:15:29 +0100 Hans-Peter Diettrich wrote: >[...] > The Delphi (and FPC) encoding model allows for strings of different > static (declared) and dynamic (true content) encoding, see the special > handling of RawByteString (Wiki). > > So far it's not a good idea to simply *as

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Hans-Peter Diettrich
Graeme Geldenhuys schrieb: How is ThousandSeparator and DecimalSeparator supposed to work it TFormatSettings? If you switched the RTL to UTF-8 or UTF-16 a Russian thousand separator (4-byte non-breaking white space character) for example will not fit into a Char type. The Char type is quite us

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Hans-Peter Diettrich
luiz americo pereira camara schrieb: When DefaultSystemCodePage is CP_ACP the variable S will have the content of UTF8 but the encoding will be ACP (in my case 1252), just like is today. With DefaultSystemCodePage as CP_UTF8 both content and code page will match The Delphi (and FPC) encoding

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Graeme Geldenhuys
On 2014-11-24 16:36, Mattias Gaertner wrote: > It has not yet been converted. Many thanks for confirming that. > We can help the FPC team by collecting all places. Where should we report this? Mantis or Unicode page of the Wiki? Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 16:25:15 + Graeme Geldenhuys wrote: >[...] > Or is TFormatSettings just something that hasn't yet been converted to > be Unicode friendly? It has not yet been converted. We can help the FPC team by collecting all places. Mattias --

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Graeme Geldenhuys
On 2014-11-22 16:38, Michael Van Canneyt wrote: > The exact behaviour of the RTL is controlled by a couple of variables: > DefaultSystemCodePage, DefaultFileSystemCodePage , > DefaultRTLFileSystemCodePage. I've read the updated wiki page, but still confused about something... TFormatSettings =

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 12:45:54 -0300 luiz americo pereira camara wrote: > 2014-11-24 8:15 GMT-03:00 Mattias Gaertner : >[...] > > This works with or without {$codepage utf8}: > > > > S := 'João'; // constant to (Ansi or Short)string > > > > Without {$codepage utf8} > When DefaultSystemCodePage is

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread luiz americo pereira camara
2014-11-24 8:15 GMT-03:00 Mattias Gaertner : > On Sun, 23 Nov 2014 21:37:56 -0300 > luiz americo pereira camara wrote: > > > The attached program show how data loss can occur > > The program uses writeln, which converts to console CP. > When you save the strings to a file you can see what they co

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Graeme Geldenhuys
On 2014-11-24 10:52, Michael Schnell wrote: > I don't know the internals of the program(s). It's a huge system and > does anything that somehow might be possible :-) . Luckily you have everything unit tested right. So it would simply be a case of running the test suite to see what works and what

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
Please don't start an UTF war again. This has been discussed in length and a zillion times. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell
On 11/24/2014 02:50 PM, Hans-Peter Diettrich wrote: code, the user should be allowed to use the string encoding (and byte cont per character), he finds the most convenient for his application. I'm not sure what exactly you mean here. Here I menat that for a *new project* the user might be wil

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Sven Barth
Am 24.11.2014 14:55 schrieb "Hans-Peter Diettrich" : > Please note that until now Windows did the Ansi to UTF conversions itself, in every API call with strings involved. If this was not noticed before, the conversions won't be noticeable afterwards as well. This is something that one definitely s

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell
On 11/24/2014 02:19 PM, Hans-Peter Diettrich wrote: A move to UTF-16 instead will only favor Windows, Regarding the RTL interface, you of course are right. Doing the user software with UTF-16 instead of RTZF-8 strings, in many cases (but of course not perfectly) allows for keeping old-style 1

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/23/2014 07:52 PM, Felipe Monteiro de Carvalho wrote: Well, the first reports of how the unicode rtl would look like were pretty scary: Total break of the string part of millions of lines of code that people wrote with Lazarus since years. That is why I stopped re

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Sun, 23 Nov 2014 18:27:12 -0300 luiz americo pereira camara wrote: > 2014-11-20 13:21 GMT-03:00 Mattias Gaertner : >[...] > Please test and tell what you find out. > > > > > The FormatSettings fields are still encoded with System Code Page > regardless of DefaultSystemCodePage value. > > Whil

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 13:12:04 +0100 Michael Schnell wrote: > On 11/24/2014 12:01 PM, Juha Manninen wrote: > > See the request from Mattias : "Please test and tell what you find out." > > I have not enough knowledge to be able to patch the compiler :-( I asked for testing compiling with -dEnableU

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell
On 11/24/2014 12:01 PM, Juha Manninen wrote: See the request from Mattias : "Please test and tell what you find out." I have not enough knowledge to be able to patch the compiler :-( let's keep this thread in a more congrete level. Agreed (even if I don't think that will lead to anything fai

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Mon, 24 Nov 2014 12:15:03 +0100 Mattias Gaertner wrote: >[...] > I guess it would be a good idea to pass -Fcutf8 with FPC 2.7.1. For > both modes. On second thought: only for new mode. Passing it in the old mode will make the wide/unicode/utf8string work, but the Ansi/Shortstring will be wro

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Mattias Gaertner
On Sun, 23 Nov 2014 21:37:56 -0300 luiz americo pereira camara wrote: > 2014-11-20 13:21 GMT-03:00 Mattias Gaertner : >[...] First of all: Thanks for testing. > Without {$codepage utf8} directive String constants will get Code Page 0 > (CP_ACP) and not the 1200 (UTF16 - UnicodeString). Beware:

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Juha Manninen
On Mon, Nov 24, 2014 at 11:33 AM, Michael Schnell wrote: > IMHO that would be just GREAT to allow for doing portable software. The RTL > and LCL interface should be OS ignorant for portability. In user code, the > user should be allowed to use the string encoding (and byte cont per > character), h

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell
On 11/24/2014 11:44 AM, luiz americo pereira camara wrote: If the program does not explicitely assumesa specific encoding, i.e. use only String type and do not do low level string handling, there will be no need to change. I don't know the internals of the program(s). It's a huge system and

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread luiz americo pereira camara
2014-11-24 6:29 GMT-03:00 Michael Schnell : > On 11/23/2014 07:52 PM, Felipe Monteiro de Carvalho wrote: > >> >> Well, the first reports of how the unicode rtl would look like were >> pretty scary: Total break of the string part of millions of lines of >> code that people wrote with Lazarus since

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell
On 11/22/2014 05:18 PM, Hans-Peter Diettrich wrote: Does this mean that Lazarus (new mode) ignores the OS system codepage setting? IMHO that would be just GREAT to allow for doing portable software. The RTL and LCL interface should be OS ignorant for portability. In user code, the user should

Re: [Lazarus] UTF8 RTL for Windows

2014-11-24 Thread Michael Schnell
On 11/23/2014 07:52 PM, Felipe Monteiro de Carvalho wrote: Well, the first reports of how the unicode rtl would look like were pretty scary: Total break of the string part of millions of lines of code that people wrote with Lazarus since years. That is why I stopped recommending Lazarus to my c

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread Sven Barth
On 24.11.2014 03:19, luiz americo pereira camara wrote: I updated the test app to show the hexadecimal representation of the string. When {$codepage utf8} is set, all string encoding and content is right matching each other regardless of MultiByteConversionCodePage Without {$codepage utf8}: Wh

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread Sven Barth
On 24.11.2014 01:37, luiz americo pereira camara wrote: 2014-11-20 13:21 GMT-03:00 Mattias Gaertner mailto:nc-gaert...@netcologne.de>>: Please test and tell what you find out. Without {$codepage utf8} directive String constants will get Code Page 0 (CP_ACP) and not the 1200 (UTF16 - Un

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread luiz americo pereira camara
I updated the test app to show the hexadecimal representation of the string. When {$codepage utf8} is set, all string encoding and content is right matching each other regardless of MultiByteConversionCodePage Without {$codepage utf8}: When MultiByteConversionCodePage is CP_ACP (default) one str

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread luiz americo pereira camara
I added {.$codepage utf8} and all strings output as "Joao". Got confused. I did not to expect changes in the constant assigned to the UnicodeString variable Need to check what is the correct UTF8 output: "JoA£o" or "Joao" Luiz 2014-11-23 21:37 GMT-03:00 luiz americo pereira camara : > > > 201

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread luiz americo pereira camara
2014-11-20 13:21 GMT-03:00 Mattias Gaertner : > > Please test and tell what you find out. > Without {$codepage utf8} directive String constants will get Code Page 0 (CP_ACP) and not the 1200 (UTF16 - UnicodeString). String variables assigned to those constants will also have Code Page = 0 This

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread luiz americo pereira camara
2014-11-20 13:21 GMT-03:00 Mattias Gaertner : > > 2. The new mode: The LCL, FCL and RTL treat all "String" as UTF-8 > encoded. Most RTL file functions now work with full Unicode. > For example FileExists and aStringList.LoadFromFile(Filename) now > support full Unicode. > [..] Please test and te

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread Felipe Monteiro de Carvalho
On Sun, Nov 23, 2014 at 1:56 PM, Michael Van Canneyt wrote: > Don't worry. Computers are not scary, not really. Just look at "Terminator" > (or any other Sci-Fi involving computers), the humans always win in the > end... :-) Well, the first reports of how the unicode rtl would look like were pret

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread Graeme Geldenhuys
On 2014-11-23 12:56, Michael Van Canneyt wrote: > the humans always win in the end... :-) ROFL > Phew... At least something we did better in the whole string mess ... ;) 9/10 times FPC does everything better than Delphi. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit u

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread Mattias Gaertner
On Sun, 23 Nov 2014 13:56:42 +0100 (CET) Michael Van Canneyt wrote: >[...] > Anyway, I was just trying to say that a 1-byte string is not necessarily > UTF-8 in FPC 2.7.1. Yes, you can still store anything you like in strings. And you can store UTF-8 in a string and say it is not. Mattias --

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread Michael Van Canneyt
On Sun, 23 Nov 2014, Mattias Gaertner wrote: True. Although many programmers misunderstand what this means. It is not as scary as it sounds. To all the scared people: Don't worry. Computers are not scary, not really. Just look at "Terminator" (or any other Sci-Fi involving computers), the

Re: [Lazarus] UTF8 RTL for Windows

2014-11-23 Thread Sven Barth
Am 23.11.2014 00:15 schrieb "Mattias Gaertner" : > > Additionally, most basic File I/O routines now correctly call the underlying > > OS-es file routines with the codepage the OS expects (which is WideString on Windows). > > Is it safe to say UTF-16? Or are there still UCS-2 Windows? Till NT 4 inc

Re: [Lazarus] UTF8 RTL for Windows

2014-11-22 Thread Mattias Gaertner
On Sat, 22 Nov 2014 17:38:33 +0100 (CET) Michael Van Canneyt wrote: >[...] > > Yes, with the UTF8 RTL. The default RTL uses system codepage. > > Careful, there is no such thing as the "UTF8 RTL". > > There is now a "Unicode and CodePage-aware RTL". Well, yes, you are right of course. But "Unic

Re: [Lazarus] UTF8 RTL for Windows

2014-11-22 Thread Mattias Gaertner
On Sat, 22 Nov 2014 17:18:35 +0100 Hans-Peter Diettrich wrote: > Mattias Gaertner schrieb: > > > // GetCommandLineW returns a UTF-16 PWideChar > > // the compiler adds code to convert this to the > > // default system codepage (CP_ACP = CP_UTF8) > > // the resulting string has StringCode

Re: [Lazarus] UTF8 RTL for Windows

2014-11-22 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: // GetCommandLineW returns a UTF-16 PWideChar // the compiler adds code to convert this to the // default system codepage (CP_ACP = CP_UTF8) // the resulting string has StringCodePage CP_ACP // and is encoded in UTF-8. Does this mean that Lazarus (new mode)

Re: [Lazarus] UTF8 RTL for Windows

2014-11-22 Thread Michael Van Canneyt
On Sat, 22 Nov 2014, Mattias Gaertner wrote: On Sat, 22 Nov 2014 16:18:09 +0100 Jürgen Hestermann wrote: Am 2014-11-22 um 15:06 schrieb Mattias Gaertner: > procedure TForm1.FormCreate(Sender: TObject); > var s: string; // String = AnsiString because of $H+ > begin > s:=GetCommandLineW

Re: [Lazarus] UTF8 RTL for Windows

2014-11-22 Thread Mattias Gaertner
On Sat, 22 Nov 2014 16:18:09 +0100 Jürgen Hestermann wrote: > Am 2014-11-22 um 15:06 schrieb Mattias Gaertner: > > procedure TForm1.FormCreate(Sender: TObject); > > var s: string; // String = AnsiString because of $H+ > > begin > > s:=GetCommandLineW; > > // GetCommandLineW returns a UTF

Re: [Lazarus] UTF8 RTL for Windows

2014-11-22 Thread Jürgen Hestermann
Am 2014-11-22 um 15:06 schrieb Mattias Gaertner: > procedure TForm1.FormCreate(Sender: TObject); > var s: string; // String = AnsiString because of $H+ > begin > s:=GetCommandLineW; > // GetCommandLineW returns a UTF-16 PWideChar > // the compiler adds code to convert this to the > // defa

Re: [Lazarus] UTF8 RTL for Windows

2014-11-22 Thread Mattias Gaertner
On Sat, 22 Nov 2014 14:37:00 +0100 Jürgen Hestermann wrote: > Am 2014-11-20 um 17:21 schrieb Mattias Gaertner: > > The development version of FPC 2.7.1 has extended Strings and many RTL > > functions now work for codepages other than the system codepage. > > > 2. The new mode: The LCL, FC

Re: [Lazarus] UTF8 RTL for Windows

2014-11-22 Thread Jürgen Hestermann
Am 2014-11-20 um 17:21 schrieb Mattias Gaertner: > The development version of FPC 2.7.1 has extended Strings and many RTL > functions now work for codepages other than the system codepage. > 2. The new mode: The LCL, FCL and RTL treat all "String" as UTF-8 encoded. ... > When accessing the Wi

Re: [Lazarus] UTF8 RTL for Windows

2014-11-20 Thread silvioprog
On Thu, Nov 20, 2014 at 1:21 PM, Mattias Gaertner wrote: > Hi all, especially Windows users, > > The development version of FPC 2.7.1 has extended Strings and many RTL > functions now work for codepages other than the system codepage. > > This means Lazarus can now be compiled in two modes: > > 1

[Lazarus] UTF8 RTL for Windows

2014-11-20 Thread Mattias Gaertner
Hi all, especially Windows users, The development version of FPC 2.7.1 has extended Strings and many RTL functions now work for codepages other than the system codepage. This means Lazarus can now be compiled in two modes: 1. The old mode: LCL treats all "String" as UTF-8 encoded. When accessing