[lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Graeme Geldenhuys
Hi, I asked a similar question in the MSEgui newsgroup as well. What was the reason for choosing to support UTF-8 instead of UTF-16? - Quoted Mattias from 6 months ago -- The LCL will support UTF-8 and provide some extra functions for UTF-16, because UTF-8 is more compatible to

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Michael Van Canneyt
On Fri, 5 Oct 2007, Graeme Geldenhuys wrote: > Hi, > > I asked a similar question in the MSEgui newsgroup as well. What was > the reason for choosing to support UTF-8 instead of UTF-16? > > - Quoted Mattias from 6 months ago -- > The LCL will support UTF-8 and provide some ex

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Mattias Gaertner
On Fri, 5 Oct 2007 09:27:59 +0200 "Graeme Geldenhuys" <[EMAIL PROTECTED]> wrote: > Hi, > > I asked a similar question in the MSEgui newsgroup as well. What was > the reason for choosing to support UTF-8 instead of UTF-16? > > - Quoted Mattias from 6 months ago -- > The LCL will

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Vincent Snijders
Michael Van Canneyt schreef: On Fri, 5 Oct 2007, Graeme Geldenhuys wrote: Hi, I asked a similar question in the MSEgui newsgroup as well. What was the reason for choosing to support UTF-8 instead of UTF-16? - Quoted Mattias from 6 months ago -- The LCL will support UTF-8 an

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Paul Ishenin
Graeme Geldenhuys wrote: Does this mean UTF-8 was chosen only because it is more compatible with existing pascal programs? Any other reasons? Is UTF-16 cover all languages? As I know it have problems with Chinese and/or Japanese languages. While utf-8 doesnot have such problems. More over m

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Mattias Gaertner
On Fri, 5 Oct 2007 09:36:59 +0200 (CEST) Michael Van Canneyt <[EMAIL PROTECTED]> wrote: > > > On Fri, 5 Oct 2007, Graeme Geldenhuys wrote: > > > Hi, > > > > I asked a similar question in the MSEgui newsgroup as well. What > > was the reason for choosing to support UTF-8 instead of UTF-16? > >

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Mattias Gaertner
On Fri, 05 Oct 2007 16:00:41 +0800 Paul Ishenin <[EMAIL PROTECTED]> wrote: > Graeme Geldenhuys wrote: > > Does this mean UTF-8 was chosen only because it is more compatible > > with existing pascal programs? Any other reasons? > > > > Is UTF-16 cover all languages? As I know it have problems wi

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread ik
On 10/5/07, Mattias Gaertner <[EMAIL PROTECTED]> wrote: > On Fri, 05 Oct 2007 16:00:41 +0800 > Paul Ishenin <[EMAIL PROTECTED]> wrote: > > > Graeme Geldenhuys wrote: > > > Does this mean UTF-8 was chosen only because it is more compatible > > > with existing pascal programs? Any other reasons? > >

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread anteusz
Paul Ishenin wrote: Graeme Geldenhuys wrote: Does this mean UTF-8 was chosen only because it is more compatible with existing pascal programs? Any other reasons? Is UTF-16 cover all languages? As I know it have problems with Chinese and/or Japanese languages. While utf-8 doesnot have such p

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread anteusz
Graeme Geldenhuys wrote: Hi, I asked a similar question in the MSEgui newsgroup as well. What was the reason for choosing to support UTF-8 instead of UTF-16? - Quoted Mattias from 6 months ago -- The LCL will support UTF-8 and provide some extra functions for UTF-16, because U

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Mattias Gaertner
On Fri, 05 Oct 2007 13:14:23 +0200 Luca Olivetti <[EMAIL PROTECTED]> wrote: > En/na [EMAIL PROTECTED] ha escrit: > > > * WideString allows indexed "[]" accessing individual chars. > > > > This does not seem to be correct. I read that utf16 can be 4 byte > > long.. Then calculation is needed some

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Mattias Gaertner
On Fri, 5 Oct 2007 10:45:18 +0200 ik <[EMAIL PROTECTED]> wrote: > On 10/5/07, Mattias Gaertner <[EMAIL PROTECTED]> wrote: > > On Fri, 05 Oct 2007 16:00:41 +0800 > > Paul Ishenin <[EMAIL PROTECTED]> wrote: > > > > > Graeme Geldenhuys wrote: > > > > Does this mean UTF-8 was chosen only because it is

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Felipe Monteiro de Carvalho
On 10/5/07, Luca Olivetti <[EMAIL PROTECTED]> wrote: > Unless you're dealing with klingon and ancient languages, I think you > can assume that for 99.99% of currently spoken languages every character > will be exactly 2 bytes long. You are forgetting about chinese. Some billion people speak it =)

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-05 Thread Luca Olivetti
En/na [EMAIL PROTECTED] ha escrit: * WideString allows indexed "[]" accessing individual chars. This does not seem to be correct. I read that utf16 can be 4 byte long.. Then calculation is needed sometimes... Unless you're dealing with klingon and ancient languages, I think you can assume t

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-07 Thread Felipe Monteiro de Carvalho
Hi, I was surfing wikipedia and I found a good reason why not to use UCS-2. It seams to be prohibited to distribute software in mainland china that only partially supports the chinese characters (like is the case for UCS-2). Source: http://en.wikipedia.org/wiki/GB18030 "In a move of historic si

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-07 Thread Marco Ciampa
On Fri, Oct 05, 2007 at 01:14:23PM +0200, Luca Olivetti wrote: > En/na [EMAIL PROTECTED] ha escrit: > >> * WideString allows indexed "[]" accessing individual chars. >> This does not seem to be correct. I read that utf16 can be 4 byte long.. >> Then calculation is needed sometimes... > > Unless yo

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-08 Thread Luca Olivetti
En/na Marco Ciampa ha escrit: On Fri, Oct 05, 2007 at 01:14:23PM +0200, Luca Olivetti wrote: En/na [EMAIL PROTECTED] ha escrit: * WideString allows indexed "[]" accessing individual chars. This does not seem to be correct. I read that utf16 can be 4 byte long.. Then calculation is needed some

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-08 Thread Luca Olivetti
En/na Luca Olivetti ha escrit: You have to go through the string for UTF-8 and UTF-16 encodings so the advantages are at least questionable... Yes, but my (wrong) premise is that you could assume all characters are 2 bytes wide, so the Nth character would be at N*2 byte. BTW, using strings

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-08 Thread Razvan Adrian Bogdan
On 10/8/07, Luca Olivetti <[EMAIL PROTECTED]> wrote: > En/na Luca Olivetti ha escrit: > > >> You have to go through the string for UTF-8 and UTF-16 encodings so > >> the advantages are at least questionable... > > > > Yes, but my (wrong) premise is that you could assume all characters are > > 2 byt

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-08 Thread Mattias Gärtner
Zitat von Razvan Adrian Bogdan <[EMAIL PROTECTED]>: > On 10/8/07, Luca Olivetti <[EMAIL PROTECTED]> wrote: > > En/na Luca Olivetti ha escrit: > > > > >> You have to go through the string for UTF-8 and UTF-16 encodings so > > >> the advantages are at least questionable... > > > > > > Yes, but my (w

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-08 Thread Graeme Geldenhuys
On 08/10/2007, Razvan Adrian Bogdan <[EMAIL PROTECTED]> wrote: > char would be nice too, maybe even implemented in FPC for UTF8string > such as Lenght(utf8string) or indexing utf8string[1] to return the > char not the byte as UTF32. In fpGUI I have a few helper functions for UTF-8 strings (Length,

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-08 Thread Mattias Gärtner
Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>: > On 08/10/2007, Razvan Adrian Bogdan <[EMAIL PROTECTED]> wrote: > > char would be nice too, maybe even implemented in FPC for UTF8string > > such as Lenght(utf8string) or indexing utf8string[1] to return the > > char not the byte as UTF32. > > In f

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-08 Thread Luca Olivetti
En/na Mattias Gärtner ha escrit: For most string operations, like computing the byte length or comparing strings ASCII case insensitive, UTF-8 is 100% compatible. but not if you need char length, say limiting a text to 40 characters and indicating there that the text has been truncated with '

Re: [lazarus] UTF-8 vs UTF-16 support

2007-10-08 Thread Mattias Gärtner
Zitat von Luca Olivetti <[EMAIL PROTECTED]>: > En/na Mattias Gärtner ha escrit: > > > For most string operations, like computing the byte length or comparing > strings > > ASCII case insensitive, UTF-8 is 100% compatible. > > but not if you need char length, say limiting a text to 40 characters >