UpperCase, LowerCase, CapitalCase, WordBreak, ParagraphBreak, ...
almost all have some language exceptions.
I don't doubt that you are right here, but I don't think that there is
any support for this in the RTL. So it seems to be a lot less relevant
than general Unicode handling.
So I thin
On 2008-10-24 02:46, Felipe Monteiro de Carvalho wrote:
I agree with Daniël on this one. Simplify. ë --> Ë always
If you need something which takes into consideration the language then
build another routine with more parameters.
It's not that simple.
How would you uppercase this piece of str
I agree with Daniël on this one. Simplify. ë --> Ë always
If you need something which takes into consideration the language then
build another routine with more parameters.
--
Felipe Monteiro de Carvalho
___
fpc-devel maillist - fpc-devel@lists.freep
Hello listmember,
Thursday, October 23, 2008, 11:58:51 PM, you wrote:
l> Yes, it is impretative that we know the language of the word is in, so that
l> UpperCase("sólo", langSpanish) --> "SÓLO"
l> UpperCase("solo", langSpanish) --> "SOLO"
l> Otherwise, we may end up altering the meaning of the te
Michael Van Canneyt schreef:
On Thu, 23 Oct 2008, Vincent Snijders wrote:
Michael Van Canneyt schreef:
And did you fix the 'TObject not found' with a short-term solution ? :-)
Maybe svn up -r11887 (in fpc/trunk)
home: >svn log -r 11887 .
> DM> Example: In Dutch uppercase characters generally do not get
> tremas: Daniël becomes DANIEL. Should an uppercase routine worry?
> No, this is a spelling convention, the correct uppercase of ë is
> Ë, we should not confuse spelling with uppercasing.
No. This is not a spelling convention. It
On Thu, 23 Oct 2008, Vincent Snijders wrote:
> Michael Van Canneyt schreef:
> >
> > And did you fix the 'TObject not found' with a short-term solution ? :-)
>
> Maybe svn up -r11887 (in fpc/trunk)
home: >svn log -r 11887 .
--
Michael Van Canneyt schreef:
And did you fix the 'TObject not found' with a short-term solution ? :-)
Maybe svn up -r11887 (in fpc/trunk)
Vincent
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/
On Thu, 23 Oct 2008, Mattias Gaertner wrote:
> On Thu, 23 Oct 2008 08:53:27 +0200 (CEST)
> "Peter Vreman" <[EMAIL PROTECTED]> wrote:
>
> > > On Wed, 22 Oct 2008 10:32:36 +0200 (CEST)
> > > "Peter Vreman" <[EMAIL PROTECTED]> wrote:
> > >
> > >> > As of version 2.3.1, the compiler by itself indic
On Thu, 23 Oct 2008 08:53:27 +0200 (CEST)
"Peter Vreman" <[EMAIL PROTECTED]> wrote:
> > On Wed, 22 Oct 2008 10:32:36 +0200 (CEST)
> > "Peter Vreman" <[EMAIL PROTECTED]> wrote:
> >
> >> > As of version 2.3.1, the compiler by itself indicates all the
> >> > various features it supports with FPC_HAS_
Op Thu, 23 Oct 2008, schreef JoshyFun:
Hello Daniël,
Thursday, October 23, 2008, 5:34:59 PM, you wrote:
DM> Don't overexagerate, this is true with plain ASCII as well. Non-English
DM> software exists already for over 5 decades and nothing has stopped us to
DM> write code that performs the fu
Hello Daniël,
Thursday, October 23, 2008, 5:34:59 PM, you wrote:
DM> Don't overexagerate, this is true with plain ASCII as well. Non-English
DM> software exists already for over 5 decades and nothing has stopped us to
DM> write code that performs the functions you name.
I'm not overexagerating,
Op Thu, 23 Oct 2008, schreef JoshyFun:
Hello Michael,
Thursday, October 23, 2008, 1:46:48 PM, you wrote:
More importantly, most of such routines will be implicitely tied to a
certain language or language group already.
MS> Which kind of UCS2 based function do you think are tied to a
MS>
Hello Michael,
Thursday, October 23, 2008, 1:46:48 PM, you wrote:
>> More importantly, most of such routines will be implicitely tied to a
>> certain language or language group already.
>>
MS> Which kind of UCS2 based function do you think are tied to a
MS> language(group) ?
UpperCase, Lowe
http://www.unicode.org/reports/tr9/
Thanks. I see. (In fact I even did do embedded software for a display
that can show Hebrew text. But this was with ANSI code.)
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.fre
Michael Schnell wrote:
Since it converts the UTF8 file internally to UCS2 on read before
editing.
Seems really silly to me.
No it's not. This way you have internally only to support 2 editors. One
with bytechars and one with wordchars (ignoring surrogates and other stuff)
But the file len
I doubt that you will never need to support decomposed characters
(such as ä being encoded as basically "a¨"). It's not that uncommon.
This is the nasty old stuff Unicode should be useful to get rid of
-Michael
___
fpc-devel maillist - fpc-de
On Thursday 23 October 2008 13.58:04 Michael Schnell wrote:
> > Bidi stuff? You are aware of the fact that unicode strings can contain
> > e.g. bidi markers?
>
> Sorry, never heard of bidi :(
>
Bidirectional text. Much more important than the hypothetical codepoints above
the BMP. MSEgui does not
Since it converts the UTF8 file internally to UCS2 on read before
editing.
Seems really silly to me.
But the file length really indicated that it's utf8 coded and when
looking at the file with WinCommander's hex viewer it's utf-8. So I
suppose that you are right and the nasty trick is Ultrae
Michael Schnell schrieb:
>
>> Bidi stuff? You are aware of the fact that unicode strings can contain
>> e.g. bidi markers?
> Sorry, never heard of bidi :(
>
http://www.unicode.org/reports/tr9/
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
h
Bidi stuff? You are aware of the fact that unicode strings can contain
e.g. bidi markers?
Sorry, never heard of bidi :(
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
Michael Schnell wrote:
Ultraedit might fool you here. Id edits either ansi or usc2. If you
have a utf8 encoded file, it will show the contents in hex as being ucs2
That might be. But it would even virtually insert a BOPM ?!?!?!? Why
should it do this when using the hex editor ?
Since it conv
On Thursday 23 October 2008 13.31:30 Florian Klaempfl wrote:
> This is also a simplified view.
> - firstly, which real world (!) task really requires to execute an
> operation like this, mostly it's something like copy(s,pos(...),...);
> - secondly, a properly coded utf-16 application shouldn't do
Michael Schnell schrieb:
>
>> More importantly, most of such routines will be implicitely tied to a
>> certain language or language group already.
>>
> Which kind of UCS2 based function do you think are tied to a
> language(group) ?
Bidi stuff? You are aware of the fact that unicode strings ca
On 23 Oct 2008, at 13:41, Michael Schnell wrote:
utf-16 application shouldn't do this
either: it doesn't handle surrogates properly
Right you are. For me WideString is UCS2 and not UTF16, as I regard
it as a sequence of WideChar so that the Unicode user code can be
done using WideChar and W
More importantly, most of such routines will be implicitely tied to a
certain language or language group already.
Which kind of UCS2 based function do you think are tied to a
language(group) ?
-Michael
___
fpc-devel maillist - fpc-devel@lists.f
Ultraedit might fool you here. Id edits either ansi or usc2. If you
have a utf8 encoded file, it will show the contents in hex as being ucs2
That might be. But it would even virtually insert a BOPM ?!?!?!? Why
should it do this when using the hex editor ?
-Michael
In our previous episode, Florian Klaempfl said:
> > But if you use UTF8String you need to be aware that you can't do simple
> > and totally normal things like s := copy(s, 3); to get the first three
> > characters of a string. Really finding the first three characters of a
> > string is an interest
utf-16 application shouldn't do this
either: it doesn't handle surrogates properly
Right you are. For me WideString is UCS2 and not UTF16, as I regard it
as a sequence of WideChar so that the Unicode user code can be done
using WideChar and WideString. WideChar only has 16 Bits. So this
rest
Michael Schnell wrote:
The compiler definitively eats no ucs-2 encoded sources.
I did check several times: My source file looks like this when I open it
with Ultra-Edit and tell to show it in Hex:
FF FE 75 00 6E 0069 00 74 00 20 00 55 00 6E 00 ..u.n.i.t. .U.n.
Ultraedit might fool you h
Michael Schnell schrieb:
>
>> The conversion
>> utf-8<->utf-16 is a very expensive operation and the compiler has to
>> insert it all over the place and people would cry about the performance
>> of their programs.
> Of course I do agree.
>
> If you want to care about performance you need to know
If you want widestring, then maybe mseide is a better option for you.
Again I do know this, and I in fact don't have a project that needs
Unicode. But the cause why I started this thread is to help making
Lazarus / FPC even more useful.
-Michael
__
Michael Schnell schreef:
The conversion
utf-8<->utf-16 is a very expensive operation and the compiler has to
insert it all over the place and people would cry about the performance
of their programs.
Of course I do agree.
If you want to care about performance you need to know what to do:
Eit
The conversion
utf-8<->utf-16 is a very expensive operation and the compiler has to
insert it all over the place and people would cry about the performance
of their programs.
Of course I do agree.
If you want to care about performance you need to know what to do:
Either use WideString "all ov
As has been said before: the compiler itself simply does not support
UCS-2. Regardless of any BOM, compiler setting or Lazarus setting, it
will not understand it.
See ,y other post in this thread: Windows XP seems to play some tricks
on us here so that Ultraedit sees the UCS2 coded file whil
Op Thu, 23 Oct 2008, schreef Michael Schnell:
The compiler definitively eats no ucs-2 encoded sources.
I did check several times: My source file looks like this when I open it with
Ultra-Edit and tell to show it in Hex:
FF FE 75 00 6E 0069 00 74 00 20 00 55 00 6E 00 ..u.n.i.t. .U.n.
Now
The compiler definitively eats no ucs-2 encoded sources.
I did check several times: My source file looks like this when I open it
with Ultra-Edit and tell to show it in Hex:
FF FE 75 00 6E 0069 00 74 00 20 00 55 00 6E 00 ..u.n.i.t. .U.n.
Now I created a Delphi program and read the file wi
On 23 Oct 2008, at 12:20, Michael Schnell wrote:
No no, a string with unicode characters is interpreted by the
compiler as widestring constant, never as UTF-8 ansistring
constant. If it does otherwise, the compiler probably does not
interpret your source code as Unicode.
The issue might be
No no, a string with unicode characters is interpreted by the compiler
as widestring constant, never as UTF-8 ansistring constant. If it does
otherwise, the compiler probably does not interpret your source code
as Unicode.
The issue might be the UCS-2 encoding of your source, perhaps try to
Michael Schnell schrieb:
>
> A decent system should be able to do the necessary conversions
> automatically:
This is a simplified view which ignores the resource wasting of this
apporoach not visible in the academical example below. The conversion
utf-8<->utf-16 is a very expensive operation and
Daniël Mantione schrieb:
> The issue might be the UCS-2 encoding of your source, perhaps try to
> feed the compiler UTF-8, I didn't even know the compiler accepts UCS-2,
> it may not work correctly.
>
The compiler definitively eats no ucs-2 encoded sources.
Op Thu, 23 Oct 2008, schreef Michael Schnell:
Then you don't understand it yet, I think.
May be
If the compiler knows your source file is UTF-8 (by BOM or directive), the
compiler generates a widestring constant and no conversion function is
called when assigning to a widestring.
In m
Then you don't understand it yet, I think.
May be
If the compiler knows your source file is UTF-8 (by BOM or directive),
the compiler generates a widestring constant and no conversion
function is called when assigning to a widestring.
In my test the source code is not UTF8 but UCS2 and do
Op Thu, 23 Oct 2008, schreef Michael Schnell:
I suppose this might solve the constant assignment on the fly, but in fact I
feel that the compiler should generate a WideString constant at compile time
instead of calling a conversion function at run time.
Then you don't understand it yet, I t
Please read the entire thread, and if you have more question
afterwards, then ask them.
In fact I don't have questions, but in this regard the way the compiler
(in Lazarus with default settings) works is very dissatisfying. IMHO the
only cure is to make the compiler aware of the UTF8Type,
AFAIK the compiler reads the source as non-utf8 (latin or some 8 bit
encoding). This leads to other things too, like identifiers cannot
contain utf8.
This was discussed in the German Lazarus Forum. Here I got a funny
result: when I right-click the Lazarus-Code-Editor I see that the file
cod
If anybody say another thing "UTF8String" is just an alias for
"ansistring" so they are exactly the same thing, but with different
name which in my case I'm using to be clear in code where things are
utf-8 encoded.
I do know that in the current implementation "UTF8String" is just an
alias fo
47 matches
Mail list logo