On Fri, 5 May 2017 16:36:51 +0300
Juha Manninen via Lazarus wrote:
> On Fri, May 5, 2017 at 4:21 PM, Mattias Gaertner via Lazarus
> wrote:
> > Oops. Which one?
>
> The FAQ says:
> "Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8}
> at the beginning of the unit."
I improved
On Fri, May 5, 2017 at 4:21 PM, Mattias Gaertner via Lazarus
wrote:
> Oops. Which one?
The FAQ says:
"Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8}
at the beginning of the unit."
The same page in "String Literals" section says:
"In most cases {$codepage utf8} / -FcUTF8 is
On Fri, May 5, 2017 at 3:56 PM, Sven Barth via Lazarus
wrote:
> That is mainly due to the compiler not supporting surrogate pairs for the
> UTF-8 -> UTF-16 conversion. If it would support them, then there wouldn't be
> a problem anymore...
That is a serious bug. Getting codepoints right is the ab
On Fri, 5 May 2017 14:12:05 +0300
Juha Manninen via Lazarus wrote:
>[...]
> Then Mattias adds FAQs contradicting the earlier texts ...
Oops. Which one?
Mattias
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/lis
Am 05.05.2017 13:50 schrieb "Juha Manninen via Lazarus" <
lazarus@lists.lazarus-ide.org>:
>
> On Fri, May 5, 2017 at 2:29 PM, Michael Van Canneyt via Lazarus
> wrote:
> > Then what is still the problem ?
>
> With BOM you get:
> Error: UTF-8 code greater than 65535 found
> which is counter-intuiti
On 2017-05-05 12:49, Juha Manninen via Lazarus wrote:
> A wrong information easily propagates, thus it is important to get this right.
No worries, I agree. Thanks for correcting my terminology.
Regards,
Graeme
--
___
Lazarus mailing list
Lazarus@lis
On Fri, May 5, 2017 at 2:02 PM, Graeme Geldenhuys via Lazarus
wrote:
> If so, when why does LCL also call the above two functions?
Graeme, they are called by LazUtils package, LazUTF8 unit, not by LCL.
It is not limited to GUI programming.
A wrong information easily propagates, thus it is importa
On Fri, May 5, 2017 at 2:29 PM, Michael Van Canneyt via Lazarus
wrote:
> Then what is still the problem ?
With BOM you get:
Error: UTF-8 code greater than 65535 found
which is counter-intuitive when the file and the string literal are both UTF-8.
It is related to changing the default codepage at
On 05.05.2017 13:02, Graeme Geldenhuys via Lazarus wrote:
On 2017-05-05 10:41, Ondrej Pokorny via Lazarus wrote:
Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string
is unicode enabled.
So does that mean you don't have to also call the following two functions
(which LCL doe
On 2017-05-05 12:17, Mattias Gaertner via Lazarus wrote:
> I wonder if it would help if FPC would store UTF-8 string literals as
> UTF-8
Yeah, that would be the logical thing to do. FPC not doing that is what
really confused me.
Regards,
Graeme
--
On Fri, 5 May 2017, Mattias Gaertner via Lazarus wrote:
On Fri, 5 May 2017 12:52:48 +0200 (CEST)
Michael Van Canneyt via Lazarus wrote:
[...]
I propose to let the compiler observe the BOM.
But I don't think more is needed.
FPC observes the BOM. Same as Delphi.
Then what is still the pr
On Fri, 5 May 2017 12:52:48 +0200 (CEST)
Michael Van Canneyt via Lazarus wrote:
>[...]
> I propose to let the compiler observe the BOM.
> But I don't think more is needed.
FPC observes the BOM. Same as Delphi.
I wonder if it would help if FPC would store UTF-8 string literals as
UTF-8 and how
On 05.05.2017 12:16, Graeme Geldenhuys via Lazarus wrote:
In the end it’s about supporting Unicode. Does it really matter
what internal encoding it is to achieve the “Unicode support”
goal?
Yep it does.
There are ways around that issue (i.e. code aware strings) but in fact
these trigger a new
On 2017-05-05 11:55, Jürgen Hestermann via Lazarus wrote:
> I use UTF-8 internally and
> convert to/from UTF-16 for all Windows API functions and
> I never found any problem with it.
> The time that the API functions requires is so much longer than the
> time for string conversion that it does not
On Fri, May 5, 2017 at 1:20 AM, Graeme Geldenhuys via Lazarus
wrote:
> A case in point. Looking at the Wiki page you listed, I read the following:
> "
> Since FPC 3.0 you must add the flag -FcUTF8 or add {$codepage UTF8} at the
> beginning of the unit.
> ...
Uhhh, the same page in "String Litera
On 2017-05-05 10:41, Ondrej Pokorny via Lazarus wrote:
> Just use "DefaultSystemCodePage := CP_UTF8" and every single-byte string
> is unicode enabled.
So does that mean you don't have to also call the following two functions
(which LCL does).
SetMultiByteConversionCodePage(CP_UTF8);
SetMulti
On 05.05.2017 12:55, Jürgen Hestermann via Lazarus wrote:
A situation where it may be a problem is when reading
(UTF-16 encoded) text files.
No, not at all. If you convert the file on the fly, there is almost 0
performance penalty.
Ondrej
--
___
La
On 05.05.2017 12:01, Michael Van Canneyt via Lazarus wrote:
On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:
Believe me, I use it in production without any problems: I have
unicode-aware TStrings, I can read files with unicode names, I can do
everything with plain FPC trunk.
I am aware o
Am 2017-05-05 um 12:16 schrieb Graeme Geldenhuys via Lazarus:
> In the end it’s about supporting Unicode. Does it really matter
> what internal encoding it is to achieve the “Unicode support”
> goal?
From a performance perspective it may be unwanted
to convert string encodings back and forth all
On Fri, 5 May 2017, Juha Manninen via Lazarus wrote:
On Fri, May 5, 2017 at 9:43 AM, Michael Van Canneyt via Lazarus
wrote:
What tricks do you still need in 3.0.x ?
The annoying tricky part with our UTF-8 solution is the assignment of
Unicode string literals.
With UTF-8 BOM it does not wor
On Fri, May 5, 2017 at 9:43 AM, Michael Van Canneyt via Lazarus
wrote:
> What tricks do you still need in 3.0.x ?
The annoying tricky part with our UTF-8 solution is the assignment of
Unicode string literals.
With UTF-8 BOM it does not work at all, as discussed here.
Without BOM it depends on str
On Fri, 5 May 2017 12:17:22 +0200
Ondrej Pokorny via Lazarus wrote:
>[...]
> Embarcadero realized they made a mistake when they disabled (yes, only
> disabled not removed) 8-byte strings from NEXTGEN compilers. UTF8String
> and RawByteString are back for all NEXTGEN compilers since 10.1. You ca
On Fri, 5 May 2017 12:01:47 +0200 (CEST)
Michael Van Canneyt via Lazarus wrote:
>[...]
> > Believe me, I use it in production without any problems: I have
> > unicode-aware TStrings, I can read files with unicode names, I can do
> > everything with plain FPC trunk.
>
> I am aware of this, I
On 05.05.2017 12:08, Mattias Gaertner via Lazarus wrote:
On Fri, 5 May 2017 10:56:41 +0100
Graeme Geldenhuys via Lazarus wrote:
[...]
or work with large amount of 8-bit strings.
Why would you want to? Unicode supports all languages,
Maybe there is a misunderstanding. Let me rephrase my ques
On 2017-05-05 11:01, Michael Van Canneyt via Lazarus wrote:
> We claim Delphi compatibility.
> So IMHO we must provide a UTF-16 Delphi compatible RTL.
In the end it’s about supporting Unicode. Does it really matter
what internal encoding it is to achieve the “Unicode support”
goal?
Regards,
G
On Fri, 5 May 2017 10:56:41 +0100
Graeme Geldenhuys via Lazarus wrote:
>[...]
> > or work with large amount of 8-bit strings.
>
> Why would you want to? Unicode supports all languages,
Maybe there is a misunderstanding. Let me rephrase my question:
What string do you use in Linux Delphi when
On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:
On 05.05.2017 11:23, Michael Van Canneyt via Lazarus wrote:
Yes, this somewhat alleviates the problem; but this still is a
single-byte TStrings, as opposed to the WideString
TStrings of Delphi. It's also still a single-byte filename argum
On 2017-05-05 10:41, Mattias Gaertner via Lazarus wrote:
> I wonder what they do when you need to access the raw 8-bit file names,
OSX, iOS, Android and Linux all use UTF-8 as standard, so filename access
is not going to be any problem. Windows is moving more and more towards
UTF-16 everywhere, s
On Fri, 5 May 2017 10:01:24 +0100
Graeme Geldenhuys via Lazarus wrote:
>[...]
> > AFAIK you are using UTF-8 in AnsiString in FPC 2.6.4. That works in
> > many cases, because of double fooling the compiler. This trick does not
> > work on Windows with RTL file functions though.
>
> Yes and true
On 05.05.2017 11:23, Michael Van Canneyt via Lazarus wrote:
Yes, this somewhat alleviates the problem; but this still is a
single-byte TStrings, as opposed to the WideString
TStrings of Delphi. It's also still a single-byte filename argument.
Yes but you forget that unicode is also single-byte
On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:
On 05.05.2017 11:17, Michael Van Canneyt via Lazarus wrote:
On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:
On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:
As far as I know, you don't need any tricks to work with un
On 05.05.2017 11:24, Graeme Geldenhuys via Lazarus wrote:
On 2017-05-05 10:17, Michael Van Canneyt via Lazarus wrote:
Something like:
sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);
Not
On 2017-05-05 10:17, Michael Van Canneyt via Lazarus wrote:
>> Something like:
>>
>> sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
>> sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
>> sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);
>
> Not yet. These are the exceptions I was talking ab
On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:
On 2017-05-05 10:17, Michael Van Canneyt via Lazarus wrote:
Something like:
sl.LoadFromFile('some_utf8_file.txt', CP_UTF8);
sl.LoadFromFile('some_utf16_file.txt', CP_UTF16);
sl.LoadFromFile('some_latin1_file.txt', CP_Latin1);
Not
On 2017-05-05 10:17, Ondrej Pokorny via Lazarus wrote:
> I don't know about 3.0.x but you can do it in trunk 3.1.1. I posted a
> patch for it (r34475).
Fantastic! Glad to see somebody was thinking in the same train of thought
as I did. :)
Is that scheduled to be back-ported to FPC 3.0.x?
Regar
On 05.05.2017 11:17, Michael Van Canneyt via Lazarus wrote:
On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:
On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:
As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception o
On Fri, 5 May 2017, Ondrej Pokorny via Lazarus wrote:
On 05.05.2017 11:06, Graeme Geldenhuys via Lazarus wrote:
On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:
As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of T
On 2017-05-05 09:59, Michael Schnell via Lazarus wrote:
> (Most obvious drawback: not flexibly typed TStrings.)
I know not everybody likes Generics, but that is where I see
Generics could come in very handy. A single TStrings implementation
that supports multiple string types.
Or just implement a
On Fri, 5 May 2017, Graeme Geldenhuys via Lazarus wrote:
On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:
As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.
Again, I didn't have time to
On 05.05.2017 11:06, Graeme Geldenhuys via Lazarus wrote:
On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:
As far as I know, you don't need any tricks to work with unicode
filenames or output in 3.0.2. Maybe with exception of TStrings and
TFileStream.
Again, I didn't have time to fol
On 2017-05-05 09:31, Kostas Michalopoulos via Lazarus wrote:
> After all, BMP does include practically all languages used today.
The bottom line:
Unicode Standard <> BMP only!
If you think that, then rather promote your application as a UCS-2
compliant application, not a Unicode compliant app
On 2017-05-05 07:43, Michael Van Canneyt via Lazarus wrote:
> As far as I know, you don't need any tricks to work with unicode
> filenames or output in 3.0.2. Maybe with exception of TStrings and
> TFileStream.
Again, I didn't have time to follow FPC 3.x development much, and I was too
confused wi
On 2017-05-05 00:15, Mattias Gaertner via Lazarus wrote:
> I added a FAQ:
> http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#What_happens_when_I_use_.24codepage_utf8.3F
Ah, thanks for that explanation.
> AFAIK you are using UTF-8 in AnsiString in FPC 2.6.4. That works in
> many cas
On 04.05.2017 16:56, Juha Manninen via Lazarus wrote:
I believe everybody is happy to get rid of the horrendous Windows
If if this is true, there is a decent need for backwards compatibility.
That is why, theoretically, code aware strings is a good idea.
Unfortunately the implementation of thos
On Fri, 5 May 2017 11:31:00 +0300
Kostas Michalopoulos via Lazarus wrote:
>[...]
> To play the devil's advocate, the fact that ALL reviews said that it has
> excellent support for Unicode means that characters outside the BMP *are*
> rare. After all, BMP does include practically all languages use
On Thu, May 4, 2017 at 8:53 PM, Graeme Geldenhuys via Lazarus <
lazarus@lists.lazarus-ide.org> wrote:
> On 2017-05-04 15:56, Juha Manninen via Lazarus wrote:
> > I have seen comments saying that treating UTF-16 as fixed width
> > encoding is OK because the characters outside BMP are so rare. It is
46 matches
Mail list logo