Re: Latin-1-characters

2004-03-16 Thread Larry Wall
On Tue, Mar 16, 2004 at 10:17:57PM +0100, Karl Brodowsky wrote: : With FFFE and FEFF this seems obvious. In case of #! it would not be clear : to me if this defaults to ISO-8859-1 (latin-1) or to utf-8. See HTML : vs. XHTML as an example where the default has been changed. Perl 6 would certainly

Re: Latin-1-characters

2004-03-16 Thread Karl Brodowsky
ns that Unicode is not fulfilling it's design goals. Yes, we can consider any file to be unicode with some encoding. That is how the Java-guys do it, with the restriction that they don't easily let you choose anything other than latin-1 + \ucafe-stuff for non-latin-1 characters (or maybe I didn

Re: Latin-1-characters

2004-03-16 Thread James Mastros
Karl Brodowsky wrote: Mark J. Reed wrote: The UTF-8 encoding is not so attractive in locales that make heavy use of characters which require several bytes to encode therein, or relatively little use of characters in the ASCII range; utf-8 is fine for languages like German, Polish, Norwegian, Spanis

Re: Latin-1-characters

2004-03-16 Thread Mark J. Reed
On 2004-03-16 at 00:28:32, Karl Brodowsky wrote: > Mark J. Reed wrote: > > >Unicode per se doesn't do anything to file sizes; it's all in how you > >encode it. > > Yes. And basically there are common ways to encode this: utf-8 and utf-16 > (or similar variants requiring >= 2 bytes per character)

Re: Latin-1-characters

2004-03-16 Thread mark . a . biggar
Another possibility is to use a UTF-8 extended system where you use values over 0x10 to encode temporary code block swaps in the encoding. I.e., some magic value means the one byte UTF-8 codes now mean the Greek block instead of the ASCII block. But you would need broad agreement for that t

Re: Latin-1-characters

2004-03-15 Thread Dan Sugalski
At 11:36 PM + 3/15/04, [EMAIL PROTECTED] wrote: Another possibility is to use a UTF-8 extended system where you use values over 0x10 to encode temporary code block swaps in the encoding. I.e., some magic value means the one byte UTF-8 codes now mean the Greek block instead of the ASCII b

Re: Latin-1-characters

2004-03-15 Thread Dan Sugalski
At 12:28 AM +0100 3/16/04, Karl Brodowsky wrote: Anyway, it will be necessary to specify the encoding of unicode in some way, which could possibly allow even to specify even some non-unicode-charsets. While I'll skip diving deeper into the swamp that is character sets and encoding (I'm already up

Re: Latin-1-characters

2004-03-15 Thread Karl Brodowsky
Mark J. Reed wrote: Unicode per se doesn't do anything to file sizes; it's all in how you encode it. Yes. And basically there are common ways to encode this: utf-8 and utf-16 (or similar variants requiring >= 2 bytes per character) The UTF-8 encoding is not so attractive in locales that make heav

Re: Latin-1-characters

2004-03-15 Thread Mark J. Reed
On 2004-03-13 at 09:02:50, Karl Brodowsky wrote: > For these guys Unicode is not so attractive, because it kind of doubles the > size of their files, Unicode per se doesn't do anything to file sizes; it's all in how you encode it. The UTF-8 encoding is not so attractive in locales that make heav

Latin-1-characters

2004-03-12 Thread Karl Brodowsky
And I do think people would rebel at using Latin-1 for that one. I get enough grief for Â...Â. :-) I can imagine that these cause some trouble with people using a charset other than ISO-8859-1 (Latin-1) that works well with 8 bit, like Greek, Arabic, Cyrillic and Hebrew. For these guys Unicode is