Hi,
I have added a Roadmap section in the following wiki page. If you find
anything missing or not 100% implemented, please add it to the wiki
page.
http://wiki.freepascal.org/FPC_Unicode_support#Roadmap_of_RTL_Unicode_support
This applies to FPC 2.3.x
Regards,
- Graeme -
__
On 11/20/08, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> > UnicodeString (the type in FPC 2.3.1) is a UTF-16
> > type,
> >
> I was not aware that there is a type with this name. Why does it exist ?
> WideString that is not Unicode does not make much sense.
I'm new to this, but as far as I unde
On 11/20/08, Florian Klaempfl <[EMAIL PROTECTED]> wrote:
>
> Well, this is one of the thousands of little problems which need to be
> solved ...
OK, I'll add this to the RoadMap wiki page as well...
Regards,
- Graeme -
___
fpGUI - a cross-platform
Graeme Geldenhuys schrieb:
> Hi
>
> How am I supposed to handle unicode characters for locale variables?
> All locale variables like ThousandSeparator is type Char and there is
> no overloaded UnicodeChar versions. This causes problems in Russian
> locales as the example below shows.
>
>
> c
Zaher Dirkey wrote:
I meant TStringList must not make Converting,
If it's known that a file is in some encoding and the instance of
TStringList uses another one, I suppose LoadFromFile needs to do the
re-encoding appropriately.
-Michael
___
fpc-deve
UnicodeString (the type in FPC 2.3.1) is a UTF-16
type,
I was not aware that there is a type with this name. Why does it exist ?
WideString that is not Unicode does not make much sense.
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal
I meant TStringList must not make Converting, convert string must be
outside of TStringList (or add special methods to it), and without
detecting the encode inside the file when LoadFromFile or Stream,
Detecting may use Seek function in the stream, and that break load
from tcp/ip connection or comp
On Thu, Nov 20, 2008 at 4:46 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
> UTF8 _is_ a Unicode coding and thus UTF8String _should_be_ a Unicode String
> type (of course it is not in the current implementation, as the compiler
> can't tell it from ANSIString, but that is exactly what we are dis
As long as the ANSIString and UTF8String and String types are the same to
the compiler this questions does not make too much sense.
Well those all refer to ANSI string types.
What do you mean by this ? These refer to "Byte String Types"
I was referring to
WideString and UnicodeString t
That must name Convert not Hack
it is same when you work with Ansi version of Lazarus/Delphi and then
try to load from unicode file.
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
Hi
How am I supposed to handle unicode characters for locale variables?
All locale variables like ThousandSeparator is type Char and there is
no overloaded UnicodeChar versions. This causes problems in Russian
locales as the example below shows.
c := UnicodeChar($00A0); // non-breaking space
Graeme Geldenhuys schrieb:
> On Thu, Nov 20, 2008 at 4:10 PM, Aleksa Todorovic <[EMAIL PROTECTED]> wrote:
>> Or... it could be implemented using generics, so one can choose:
>>
>> TStringList
>> TStringList
>> TStringList
>>
>> (sorry for C++ish syntax, but I hope you understand)
>
>
> I somehow
On Thu, Nov 20, 2008 at 4:10 PM, Aleksa Todorovic <[EMAIL PROTECTED]> wrote:
> Or... it could be implemented using generics, so one can choose:
>
> TStringList
> TStringList
> TStringList
>
> (sorry for C++ish syntax, but I hope you understand)
I somehow managed to skip the whole generics thing a
On Thu, Nov 20, 2008 at 4:09 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
>> * I can't seem to find a UnicodeString version of TStrings or TStringList
>>
>
> As long as the ANSIString and UTF8String and String types are the same to
> the compiler this questions does not make too much sense.
W
Or... it could be implemented using generics, so one can choose:
TStringList
TStringList
TStringList
(sorry for C++ish syntax, but I hope you understand)
On Thu, Nov 20, 2008 at 15:07, Florian Klaempfl <[EMAIL PROTECTED]> wrote:
> Graeme Geldenhuys schrieb:
>> Hi,
>>
>> Is there any list of mis
* I can't seem to find a UnicodeString version of TStrings or TStringList
As long as the ANSIString and UTF8String and String types are the same
to the compiler this questions does not make too much sense.
-Michael
___
fpc-devel maillist - fp
Graeme Geldenhuys schrieb:
> Hi,
>
> Is there any list of missing features for UnicodeString in the RTL?
>
> For example:
>
> * I can't seem to find a UnicodeString version of TStrings or TStringList
>
>
> Any more such cases?
No idea, nobody complainted so far ;) Just create one.
> I wo
Hi,
Is there any list of missing features for UnicodeString in the RTL?
For example:
* I can't seem to find a UnicodeString version of TStrings or TStringList
Any more such cases? I would like to create a RTL UnicodeString
RoadMap, so the missing parts can be know and implemented.
Regards
Graeme Geldenhuys schreef:
Hello again,
We are seeing more and more "hacks" being applied to projects trying
to scramble around the missing FPC feature - no built-in Unicode
supporting.
A simple example in Lazarus Loading a UTF-8 encoded file into a TMemo.
Normally you would write code as
Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>:
> On Thu, Nov 20, 2008 at 1:22 PM, peter green <[EMAIL PROTECTED]> wrote:
> >
> > The thing is we can't reasonablly provide functions based on what a user
> > would see as a character because doing so would require huge lookup tables
> > (one user v
On 20 Nov 2008, at 13:13, Graeme Geldenhuys wrote:
I think basing those functions on code points should suffice. I also
think as soon as strings are assigned or loaded from file, they should
be normalized. So two code points like the A and Umlaut code points
would become one.
How would one
On Thu, Nov 20, 2008 at 1:22 PM, peter green <[EMAIL PROTECTED]> wrote:
>
> The thing is we can't reasonablly provide functions based on what a user
> would see as a character because doing so would require huge lookup tables
> (one user visible character != one code point) so the best we can do is
On Thu, Nov 20, 2008 at 1:50 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
>> Compiler support for a unicode string is not enough for the LCL.
>> As long as base classes like TStrings uses ansistrings, the LCL must use a
>> string type, that does no conversion.
>
> Of course you are right that t
Ok, two questions for the example above:
- how do you maintain backward compatibility?
- how do you load a plain old ansi file?
You could alter the LoadFromFile(), LoadFromStream(), SaveToFile(),
SaveToStrwam() routines like below:
procedure TStringList.LoadFromFile(AFileName: TFilename; cons
Compiler support for a unicode string is not enough for the LCL.
As long as base classes like TStrings uses ansistrings, the LCL must use a
string type, that does no conversion.
Of course you are right that the RTL needs to be made up accordingly.
Maybe TStrings and friends are needed in mul
type cp850string=ansistring(CP_850);
utf8string=ansistring(CP_UTF8);
Why not use the current locale for this ? Would that be just ANSIString ?
a:=b; {Compiler knows conversion to perform at compile time.
I suppose the conversion function is provided with the locale and this
it as
The thing is we can't reasonablly provide functions based on what a
user would see as a character because doing so would require huge
lookup tables (one user visible character != one code point) so the
best we can do is code point based which isn't really much better for
most tasks than code
For best backward compatibility, I would say Copy, Length, Pos etc
should work by "character based" by default.
Agreed.
Then introduce more
optimised versions like ElementCopy, ElementLength, etc... Old
programs will work out of the box, but might experience a minor speed
penalty, until the
For best backward compatibility, I would say Copy, Length, Pos etc
should work by "character based" by default.
The thing is we can't reasonablly provide functions based on what a user
would see as a character because doing so would require huge lookup
tables (one user visible character != one
Op Thu, 20 Nov 2008, schreef Martin Friebe:
Daniël Mantione wrote:
Op Thu, 20 Nov 2008, schreef Felipe Monteiro de Carvalho:
So, what kind of support could be implemented in Free Pascal to
improve things for Lazarus and it´s users?
Maybe a real UTF8String?
There will be a real UTF8string, i
Daniël Mantione wrote:
Op Thu, 20 Nov 2008, schreef Felipe Monteiro de Carvalho:
So, what kind of support could be implemented in Free Pascal to
improve things for Lazarus and it´s users?
Maybe a real UTF8String?
There will be a real UTF8string, i.e. ansistring with UTF-8 encoding
as part of t
Op Thu, 20 Nov 2008, schreef Michael Schnell:
Isn't this the same??
I understand that D2009 uses dynamic code information, while my suggestion is
based on several different (static) types.
As I understand it is static.
type cp850string=ansistring(CP_850);
utf8string=ansistring(
Zitat von Felipe Monteiro de Carvalho <[EMAIL PROTECTED]>:
> if a real utf8string would be a solution for Lazarus (I am not saying
> it is, but it could be), we need to have a directive to change the
> default string into utf8string. To avoid a huge amount of code to need
> to be suddenly changed.
On Thu, Nov 20, 2008 at 12:55 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>>> * What about usage like: SomeString[x] := 'A';
>>
>> String element based.
>
> This also holds for Copy, Length, Pos, etc.
>
> I thinks if would be a good idea to provide dedicated functions for the
> "element based"
Op Thu, 20 Nov 2008, schreef Felipe Monteiro de Carvalho:
So, what kind of support could be implemented in Free Pascal to
improve things for Lazarus and it´s users?
Maybe a real UTF8String?
There will be a real UTF8string, i.e. ansistring with UTF-8 encoding as
part of type information, th
UCS16
UTF16 :)
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
Isn't this the same??
I understand that D2009 uses dynamic code information, while my
suggestion is based on several different (static) types.
I feel that static types are a lot easier to implement and if using them
correctly, the user can tune the program to be as fast as possible or as
System encoding is the encoding your files are written in when doing a
"echo Hello > file.txt".
nice point :)
I Suppose with my German WinXP system encoding is German ANSI
Does it hold only for files ? I suppose WinXP provides an OS API with
WideStrings (supposedly UCS16).
But how do I ha
* Copy, Length, Pos etc...?
Yup.
* What about usage like: SomeString[x] := 'A';
String element based.
This also holds for Copy, Length, Pos, etc.
I thinks if would be a good idea to provide dedicated functions for the
"element based" (fast) and the "character based" (old style compa
On Thu, Nov 20, 2008 at 12:50 PM, Daniël Mantione
<[EMAIL PROTECTED]> wrote:
>>
>> What is "system encoding" regarding different OS, locale, ... ?
>
> System encoding is the encoding your files are written in when doing a
> "echo Hello > file.txt".
Good explanation Daniël. :-) I always wonder tha
Op Thu, 20 Nov 2008, schreef Michael Schnell:
The file is assumed to be in system encoding (which can be UTF-8). Support
for reading of other encodings has not been decided on about yet and is not
part of the initial plan.
What is "system encoding" regarding different OS, locale, ... ?
S
The file is assumed to be in system encoding (which can be UTF-8).
Support for reading of other encodings has not been decided on about
yet and is not part of the initial plan.
What is "system encoding" regarding different OS, locale, ... ?
-Michael
___
On Thu, Nov 20, 2008 at 12:35:03PM +0200, Graeme Geldenhuys wrote:
> As for loading files. It's 99.9% that all files are in ANSI or UTF8
> encoding and UTF8 being fulling backward compatible with ANSI makes
> this a good thing.
UTF-8 is not compatible with ANSI, but only with ASCII.
Petr
--
I
But it seems, that not everybody is happy with the current Codegear
Unicode solution:
https://forums.codegear.com/thread.jspa?threadID=7140&tstart=0
This is neither backwards compatible, nor nice, nor fast nor small :(
After reading this thread, I am not sure, if Delphi 2009 compatibilit
Op Thu, 20 Nov 2008, schreef Bernd Mueller:
Felipe Monteiro de Carvalho wrote:
I would like to hear of others actually have a better proposal for Lazarus.
sorry, I have no idea since I am doing primarily embedded stuff. Speed and
backward compatibility are the most important factors to me
Maybe a real UTF8String?
Does this mean teach the compiler tell the type UTF8String from the type
ANSIString and do the appropriate conversion automatically (and do the
assignment of constants appropriately) ?
I suppose this in fact would solve a lot problems for Lazarus.
If on top of th
On Thu, Nov 20, 2008 at 12:18 PM, Felipe Monteiro de Carvalho
<[EMAIL PROTECTED]> wrote:
> using Lazarus uses fpc), it would be interresting if we actually work
> more or less in the same direction to provide a good unicode solution,
> instead of each part ignoring what the other is doing. And also
Felipe Monteiro de Carvalho wrote:
I would like to hear of others actually have a better proposal for Lazarus.
sorry, I have no idea since I am doing primarily embedded stuff. Speed
and backward compatibility are the most important factors to me.
But it seems, that not everybody is happy wi
Op Thu, 20 Nov 2008, schreef Michael Schnell:
If you want to help, we need to implement the Delphi 2009 encoding aware
string type, both runtime support as well as the compiler support.
A previous discussion showed that this also breaks a lot of old code and is
not really nice.
As I und
if a real utf8string would be a solution for Lazarus (I am not saying
it is, but it could be), we need to have a directive to change the
default string into utf8string. To avoid a huge amount of code to need
to be suddenly changed. Then only "ansistring" needs to be changed.
--
Felipe Monteiro de
If you want to help, we need to implement the Delphi 2009 encoding
aware string type, both runtime support as well as the compiler support.
A previous discussion showed that this also breaks a lot of old code and
is not really nice.
So a better concept seems to have a dedicated type for any
I started a separate thread for this lazarus part of the unicode talk.
On Thu, Nov 20, 2008 at 7:37 AM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
>>And that's why I urge all core FPC
>>developers to try and finalize a Unicode design. Otherwise you leave
>>it up to developers to keep
On Thu, Nov 20, 2008 at 12:07 PM, Michael Schnell <[EMAIL PROTECTED]> wrote:
>
>> Russian locale requires a >1 byte char.
>
> Hmmm. We did lots of non-Unicode Delphi programs with a Russian ANSI
> variant.
Well, I have a Russian user of fpGUI. He noted quite a few issues with
FPC's locale variab
Op Thu, 20 Nov 2008, schreef Daniël Mantione:
* Does UnicodeString work on all platforms? Linux, Windows for a start?
Yes, but all platforms will get string=unicodestring.
There is a "not" missing:
Yes, but not all platforms will get string=unicodestring.
Daniël__
Full Unicode support is for FPC 2.4. If you need it today, widestrings
are your best option.
Unfortunately working with WideString in Lazarus is close to impossible
as the LCL API is done with UTF8String and there is no correct automatic
conversion between UTF8String and WideString, as the com
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys:
On Thu, Nov 20, 2008 at 11:37 AM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
FPC supports Unicode, in 2.3.x is the UnicodeString type available being a
ref. counted utf-16 string on all platforms.
OK, I'll try to switch fpGUI's TfpgString ty
Russian locale requires a >1 byte char.
Hmmm. We did lots of non-Unicode Delphi programs with a Russian ANSI
variant.
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
FPC supports Unicode, in 2.3.x is the UnicodeString type available
being a ref. counted utf-16 string on all platforms.
Is same used by TStringList ?
I don't think so, otherwise LoadFromFile should need to be aware of
several possible file encodings. And I suppose the utf8-API of the LCL
w
On Thu, Nov 20, 2008 at 11:37 AM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
>
> FPC supports Unicode, in 2.3.x is the UnicodeString type available being a
> ref. counted utf-16 string on all platforms.
OK, I'll try to switch fpGUI's TfpgString type to alias UnicodeString
an see what happens. Obv
On Thu, Nov 20, 2008 at 10:06, Graeme Geldenhuys
<[EMAIL PROTECTED]> wrote:
>
> Unfortunately that doesn't work if the file contains unicode content,
> so the following "hack" is required which is quite nasty:
>
> ls := TStringList.Create;
> ls.LoadFromFile('someunicodefile.txt');
> for i :=
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys:
On Thu, Nov 20, 2008 at 11:28 AM, Daniël Mantione
<[EMAIL PROTECTED]> wrote:
These instructions are highly unproductive. Work on being able to compile
the RTL in either ansi/unicode depending on the platform has started.
Full Unicode support is
On Thu, Nov 20, 2008 at 10:39:00AM +0100, Daniël Mantione wrote:
>
>
> Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys:
>
>> On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> Ok, two questions for the example above:
>>> - how do you maintain backward compatibil
On Thu, Nov 20, 2008 at 11:28 AM, Daniël Mantione
<[EMAIL PROTECTED]> wrote:
>
> These instructions are highly unproductive. Work on being able to compile
> the RTL in either ansi/unicode depending on the platform has started.
> Full Unicode support is for FPC 2.4.
Well, that's the first I heard o
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys:
On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
Ok, two questions for the example above:
- how do you maintain backward compatibility?
- how do you load a plain old ansi file?
If the file is UTF-8 or ANSI, the a
Graeme Geldenhuys schrieb:
On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
Ok, two questions for the example above:
- how do you maintain backward compatibility?
- how do you load a plain old ansi file?
If the file is UTF-8 or ANSI, the above should work. UTF-8
On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl
<[EMAIL PROTECTED]> wrote:
>
> Ok, two questions for the example above:
> - how do you maintain backward compatibility?
> - how do you load a plain old ansi file?
If the file is UTF-8 or ANSI, the above should work. UTF-8 was
designed to be back
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys:
All that crap just to load a simple text file that contains unicode
content!!! :-( And the other problem is that the hack above assumes
the files content is UTF-8 encoded. If the content is UTF-16 encoded,
you need yet another hack. :-(
As far a
Graeme Geldenhuys schrieb:
Hello again,
We are seeing more and more "hacks" being applied to projects trying
to scramble around the missing FPC feature - no built-in Unicode
supporting.
A simple example in Lazarus Loading a UTF-8 encoded file into a TMemo.
Normally you would write code as
shorter (and faster) hacky crap:
ls := TStringList.Create;
ls.LoadFromFile('someunicodefile.txt');
Memo.Text := UTF8Encode(ls.Text);
ls.Free
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fp
Hello again,
We are seeing more and more "hacks" being applied to projects trying
to scramble around the missing FPC feature - no built-in Unicode
supporting.
A simple example in Lazarus Loading a UTF-8 encoded file into a TMemo.
Normally you would write code as follows (for ANSI text):
70 matches
Mail list logo