Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
Thu, 08 Aug 2002 19:28:18 +1000 (EST), Manuel M T Chakravarty [EMAIL PROTECTED] pisze: ANSI C guarantees that char is 1 byte (more precisely that sizeof (char) == 1). It says that sizeof (char) == 1 but doesn't say that it means 8 bits. sizeof is measured in chars, whatever it is. But limits

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
Thu, 8 Aug 2002 09:59:12 -0700 (PDT), anatoli [EMAIL PROTECTED] pisze: I'd still rather associate locale with a handle. I agree. http://www.sf.net/projects/qforeign/ contains an experimental character recoding library with a IO module wrapper which associates encodings with Handles. But I

Re: UTF-8 library

2002-08-10 Thread Ashley Yakeley
At 2002-08-10 01:21, Marcin 'Qrczak' Kowalczyk wrote: Perhaps we can assume some widely true facts even if ANSI C doesn't guarantee that if it makes life easier. For example that a C type corresponding to Int32 exists at all, and that different pointer types have the same representation - we

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
09 Aug 2002 10:17:21 +0200, Sven Moritz Hallberg [EMAIL PROTECTED] pisze: I argue _strongly_ against associating some sort of locale state with handles. 1) In agreement with Ashley's statements, file IO should use octets, because that's what's in a file. So it would imply two types raw

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
Sat, 10 Aug 2002 01:31:51 -0700, Ashley Yakeley [EMAIL PROTECTED] pisze: that different pointer types have the same representation - we already rely on that, don't we? No, we have separate Ptrs and FunctionPtrs IIRC... Yes, but I mean the possibility that Ptr Word8 looks differently than Ptr

Re: UTF-8 library

2002-08-10 Thread anatoli
--- Sven Moritz Hallberg [EMAIL PROTECTED] wrote: I argue _strongly_ against associating some sort of locale state with handles. 1) In agreement with Ashley's statements, file IO should use octets, because that's what's in a file. By the same token, we should handle CR/LF/CR-LF/LF-CR mess

Re: UTF-8 library

2002-08-10 Thread Ashley Yakeley
At 2002-08-10 03:03, anatoli wrote: --- Sven Moritz Hallberg [EMAIL PROTECTED] wrote: I argue _strongly_ against associating some sort of locale state with handles. 1) In agreement with Ashley's statements, file IO should use octets, because that's what's in a file. By the same token, we

Re: UTF-8 library

2002-08-10 Thread anatoli
--- Ashley Yakeley [EMAIL PROTECTED] wrote: By the same token, we should handle CR/LF/CR-LF/LF-CR mess by hand. (Files don't have lines in them, they are just sequences of octets.) Correct. Exactly what kind of newline do you want in your file? The correct answer depends on the level of

Re: UTF-8 library

2002-08-10 Thread Sven Moritz Hallberg
On Sat, 2002-08-10 at 12:03, anatoli wrote: --- Sven Moritz Hallberg [EMAIL PROTECTED] wrote: I argue _strongly_ against associating some sort of locale state with handles. 1) In agreement with Ashley's statements, file IO should use octets, because that's what's in a file. By the

Re: UTF-8 library

2002-08-10 Thread David Feuer
On Sat, 10 Aug 2002, Ashley Yakeley wrote: One of the things that really bothers me about C is the way its unspecifiedness about types can infect other languages. For instance, what exactly is a Haskell Int? I think it's the idea that's infectious, because it is a good idea. The C standard

Re: UTF-8 library

2002-08-10 Thread anatoli
[apologies if you see multiple copies; I forgot to Cc: the list the first time around.] --- Sven Moritz Hallberg [EMAIL PROTECTED] wrote: [...] I think that it's ugly, though, to do it somewhere outside, pretending the issue's not there. I value about Haskell it's clean representation of

Re: UTF-8 library

2002-08-10 Thread Joe English
Ashley Yakeley wrote: One of the things that really bothers me about C is the way its unspecifiedness about types can infect other languages. For instance, what exactly is a Haskell Int? Java, at least, stands firm, but then platform-independence was one of Java's explicit design

Re: UTF-8 library

2002-08-10 Thread Manuel M T Chakravarty
Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] wrote, Thu, 08 Aug 2002 19:28:18 +1000 (EST), Manuel M T Chakravarty [EMAIL PROTECTED] pisze: ANSI C guarantees that char is 1 byte (more precisely that sizeof (char) == 1). It says that sizeof (char) == 1 but doesn't say that it means 8

Re: UTF-8 library

2002-08-09 Thread Ketil Z. Malde
anatoli [EMAIL PROTECTED] writes: Dependence on the current locale is EXTREMELY inconvenient. Imagine that you're writing a Web browser. Web browsers get input with MIME declarations, and shouldn't rely on *any* default setting. Instead, they should read [Word8] and decode the contents

Re: UTF-8 library

2002-08-09 Thread Fergus Henderson
On 06-Aug-2002, George Russell [EMAIL PROTECTED] wrote: Converting CStrings to [Word8] is probably a bad idea anyway, since there is absolutely no reason to assume a C character will be only 8 bits long, and under some implementations it isn't. That's true in general; the C standard only

Re: UTF-8 library

2002-08-08 Thread Joe English
anatoli wrote: I'd still rather associate locale with a handle. This way, all Char and String IO functions that exist, and those that are not written yet, can work with any encoding without relying on the abomination that is setlocale(). Seconded; this is the best approach. The libc