subject:"UTF\-8 library"

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk

Thu, 08 Aug 2002 19:28:18 +1000 (EST), Manuel M T Chakravarty [EMAIL PROTECTED] pisze: ANSI C guarantees that char is 1 byte (more precisely that sizeof (char) == 1). It says that sizeof (char) == 1 but doesn't say that it means 8 bits. sizeof is measured in chars, whatever it is. But limits

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk

Thu, 8 Aug 2002 09:59:12 -0700 (PDT), anatoli [EMAIL PROTECTED] pisze: I'd still rather associate locale with a handle. I agree. http://www.sf.net/projects/qforeign/ contains an experimental character recoding library with a IO module wrapper which associates encodings with Handles. But I

Re: UTF-8 library

2002-08-10 Thread Ashley Yakeley

At 2002-08-10 01:21, Marcin 'Qrczak' Kowalczyk wrote: Perhaps we can assume some widely true facts even if ANSI C doesn't guarantee that if it makes life easier. For example that a C type corresponding to Int32 exists at all, and that different pointer types have the same representation - we

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk

09 Aug 2002 10:17:21 +0200, Sven Moritz Hallberg [EMAIL PROTECTED] pisze: I argue _strongly_ against associating some sort of locale state with handles. 1) In agreement with Ashley's statements, file IO should use octets, because that's what's in a file. So it would imply two types raw

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk

Sat, 10 Aug 2002 01:31:51 -0700, Ashley Yakeley [EMAIL PROTECTED] pisze: that different pointer types have the same representation - we already rely on that, don't we? No, we have separate Ptrs and FunctionPtrs IIRC... Yes, but I mean the possibility that Ptr Word8 looks differently than Ptr

Re: UTF-8 library

2002-08-10 Thread anatoli

--- Sven Moritz Hallberg [EMAIL PROTECTED] wrote: I argue _strongly_ against associating some sort of locale state with handles. 1) In agreement with Ashley's statements, file IO should use octets, because that's what's in a file. By the same token, we should handle CR/LF/CR-LF/LF-CR mess

Re: UTF-8 library

2002-08-10 Thread Ashley Yakeley

At 2002-08-10 03:03, anatoli wrote: --- Sven Moritz Hallberg [EMAIL PROTECTED] wrote: I argue _strongly_ against associating some sort of locale state with handles. 1) In agreement with Ashley's statements, file IO should use octets, because that's what's in a file. By the same token, we

Re: UTF-8 library

2002-08-10 Thread anatoli

--- Ashley Yakeley [EMAIL PROTECTED] wrote: By the same token, we should handle CR/LF/CR-LF/LF-CR mess by hand. (Files don't have lines in them, they are just sequences of octets.) Correct. Exactly what kind of newline do you want in your file? The correct answer depends on the level of

Re: UTF-8 library

2002-08-10 Thread Sven Moritz Hallberg

On Sat, 2002-08-10 at 12:03, anatoli wrote: --- Sven Moritz Hallberg [EMAIL PROTECTED] wrote: I argue _strongly_ against associating some sort of locale state with handles. 1) In agreement with Ashley's statements, file IO should use octets, because that's what's in a file. By the

Re: UTF-8 library

2002-08-10 Thread David Feuer

On Sat, 10 Aug 2002, Ashley Yakeley wrote: One of the things that really bothers me about C is the way its unspecifiedness about types can infect other languages. For instance, what exactly is a Haskell Int? I think it's the idea that's infectious, because it is a good idea. The C standard

Re: UTF-8 library

2002-08-10 Thread anatoli

[apologies if you see multiple copies; I forgot to Cc: the list the first time around.] --- Sven Moritz Hallberg [EMAIL PROTECTED] wrote: [...] I think that it's ugly, though, to do it somewhere outside, pretending the issue's not there. I value about Haskell it's clean representation of

Re: UTF-8 library

2002-08-10 Thread Joe English

Ashley Yakeley wrote: One of the things that really bothers me about C is the way its unspecifiedness about types can infect other languages. For instance, what exactly is a Haskell Int? Java, at least, stands firm, but then platform-independence was one of Java's explicit design

Re: UTF-8 library

2002-08-10 Thread Manuel M T Chakravarty

Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] wrote, Thu, 08 Aug 2002 19:28:18 +1000 (EST), Manuel M T Chakravarty [EMAIL PROTECTED] pisze: ANSI C guarantees that char is 1 byte (more precisely that sizeof (char) == 1). It says that sizeof (char) == 1 but doesn't say that it means 8

Re: UTF-8 library

2002-08-09 Thread Ketil Z. Malde

anatoli [EMAIL PROTECTED] writes: Dependence on the current locale is EXTREMELY inconvenient. Imagine that you're writing a Web browser. Web browsers get input with MIME declarations, and shouldn't rely on *any* default setting. Instead, they should read [Word8] and decode the contents

Re: UTF-8 library

2002-08-09 Thread Fergus Henderson

On 06-Aug-2002, George Russell [EMAIL PROTECTED] wrote: Converting CStrings to [Word8] is probably a bad idea anyway, since there is absolutely no reason to assume a C character will be only 8 bits long, and under some implementations it isn't. That's true in general; the C standard only

Re: UTF-8 library

2002-08-08 Thread Joe English

anatoli wrote: I'd still rather associate locale with a handle. This way, all Char and String IO functions that exist, and those that are not written yet, can work with any encoding without relying on the abomination that is setlocale(). Seconded; this is the best approach. The libc

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

Re: UTF-8 library

16 matches

Site Navigation

Mail list logo

Footer information