subject:"Re\: Unicode"

[Haskell] Re: unicode/internalization issues

2006-03-28 Thread John Meacham

On Sun, Mar 26, 2006 at 03:22:38PM +0400, Bulat Ziganshin wrote: > 3. Unicode support in I/O routines, i.e. ability to read/write UTF-8 > encoded files and files what use other Unicode byte encodings: not > implemented in any compiler, afaik, but there are 3rd-party libs: > Streams library, New I/O

Re: Unicode + Re: Reading/Writing Binary Data in Haskell

2003-07-14 Thread George Russell

Glynn wrote (about my binary library, snipped): > This is similar to UTF-8; however, UTF-8 is a standard format which > can be read and written by a variety of other programs. > > If we want a mechanism for encoding arbitrary Haskell strings as octet > lists, and we have a free choice as to the enc

Re: Unicode + Re: Reading/Writing Binary Data in Haskell

2003-07-14 Thread Glynn Clements

George Russell wrote: > > OTOH, existing implementations (at least GHC and Hugs) currently read > > and write "8-bit binary", i.e. characters 0-255 get read and written > > "as-is" and anything else breaks, and changing that would probably > > break a fair amount of existing code. > > The bi

RE: Unicode again

2002-01-17 Thread Simon Peyton-Jones

| there was some discussion about Unicode and the Char type | some time ago. At the moment I'm writing some Haskell code | dealing with XML. The problem is that there seems to be no | consensus concerning Char so that it is difficult for me to | deal with the XML unicode issues appropriately.

RE: Unicode again

2002-01-16 Thread Kent Karlsson

This is getting a bit off-topic for Haskell... > Isn't it fairly common to use 32bit Unicode character types in C? Yes, in some implementations, but nobody by a few Linux and SunOS programmers use that... (Those systems are far from committed to Unicode.) In some other systems wchar_t is (exc

Re: Unicode again

2002-01-16 Thread Ketil's local user

"Kent Karlsson" <[EMAIL PROTECTED]> writes: > Everyone that is serious about Unicode and where efficiency > is also of concern(!) target UTF-16 (MacOS, Windows, Epoc, Java, > Oracle, ...). Isn't it fairly common to use 32bit Unicode character types in C? I'm not sure I see the efficiency gain

RE: Unicode again

2002-01-15 Thread Kent Karlsson

> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Wolfgang Jeltsch > Sent: den 5 januari 2002 13:04 > To: The Haskell Mailing List > Subject: Unicode again > > > Hello, > there was some discussion about Unicode and the Char type > some time ago. A

RE: Unicode stupidity (Was: Unicode support)

2001-10-24 Thread Karlsson Kent - keka

> None of that "But 21 bits *is* enough". > Yeah, like 640K was enough. And countless other examples. That is not comparable. Never was. > I thought we had learned, but I was wrong... I'm especially > disheartened to hear that ISO bought into the same crap. Who's going to invent all these gaz

Re: Unicode support

2001-10-10 Thread Marcin 'Qrczak' Kowalczyk

Tue, 9 Oct 2001 14:59:09 -0700, John Meacham <[EMAIL PROTECTED]> pisze: > I think a cannonical way to get at iconvs ('man 3 iconv' for info.) > functionality in one of the standard librarys would be great. perhaps > I will have a go at it. even if the underlying platform does not have > iconv the

Re: Unicode support

2001-10-09 Thread John Meacham

On Tue, Oct 09, 2001 at 12:37:27PM +0200, Kent Karlsson wrote: > > At 2001-10-09 02:58, Kent Karlsson wrote: > > >In summary: > > >code position (=code point): a value between and 10. > > Would this be a reasonable basis for Haskell's 'Char' type? > > Yes. It's essentially UTF-32, b

Re: Unicode support

2001-10-09 Thread Marcin 'Qrczak' Kowalczyk

On Tue, 9 Oct 2001, Ashley Yakeley wrote: > Would it be worthwhile restricting Char to the 0-10 range, just as a > Word8 is restricted to 0-FF even though in GHC at least it's stored > 32-bit? It is thus restricted in GHC. I think it's a good compromise between 32-bit-Unicode and 16-bit-U

Re: Unicode support

2001-10-09 Thread Ashley Yakeley

At 2001-10-09 03:37, Kent Karlsson wrote: >> >code position (=code point): a value between and 10. >> >> Would this be a reasonable basis for Haskell's 'Char' type? > >Yes. It's essentially UTF-32, but without the fixation to 32-bit >(21 bits suffice). UTF-32 (a.k.a. UCS-4 in 10646,

Re: Unicode support

2001-10-09 Thread Kent Karlsson

- Original Message - From: "Ashley Yakeley" <[EMAIL PROTECTED]> To: "Kent Karlsson" <[EMAIL PROTECTED]>; "Haskell List" <[EMAIL PROTECTED]>; "Libraries for Haskell List" <[EMAIL PROTECTED]> Sent: Tuesday, October 09, 20

Re: Unicode support

2001-10-09 Thread Ashley Yakeley

At 2001-10-09 02:58, Kent Karlsson wrote: >In summary: > >code position (=code point): a value between and 10. Would this be a reasonable basis for Haskell's 'Char' type? At some point perhaps there should be a 'Unicode' standard library for Haskell. For instance: encodeUTF8 :: S

Re: Unicode support

2001-10-09 Thread Kent Karlsson

Just to clear up any misunderstanding: - Original Message - From: "Ashley Yakeley" <[EMAIL PROTECTED]> To: "Haskell List" <[EMAIL PROTECTED]> Sent: Monday, October 01, 2001 12:36 AM Subject: Re: Unicode support > At 2001-09-30 07:29, Marcin 'Qrc

Re: Unicode support

2001-10-08 Thread Kent Karlsson

- Original Message - From: "Dylan Thurston" <[EMAIL PROTECTED]> To: "John Meacham" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, October 05, 2001 5:47 PM Subject: Re: Unicode support > On Sun, Sep 30, 2001 at 11:01:38AM -0700, John Mea

Re: Unicode support

2001-10-08 Thread Kent Karlsson

- Original Message - From: "Wolfgang Jeltsch" <[EMAIL PROTECTED]> To: "The Haskell Mailing List" <[EMAIL PROTECTED]> Sent: Thursday, October 04, 2001 8:47 PM Subject: Re: Unicode support > On Sunday, 30 September 2001 20:01, John Meacham wrote: >

Re: Unicode support

2001-10-05 Thread Dylan Thurston

On Sun, Sep 30, 2001 at 11:01:38AM -0700, John Meacham wrote: > seeing as how the haskell standard is horribly vauge when it comes to > character set encodings anyway, I would recommend that we just omit any > reference to the bit size of Char, and just say abstractly that each > Char represents o

Re: Unicode support

2001-10-04 Thread Wolfgang Jeltsch

On Sunday, 30 September 2001 20:01, John Meacham wrote: > sorry for the me too post, but this has been a major pet peeve of mine > for a long time. 16 bit unicode should be gotten rid of, being the worst > of both worlds, non backwards compatable with ascii, endianness issues > and no constant len

Re: Unicode support

2001-09-30 Thread Jens Petersen

Colin Paul Adams <[EMAIL PROTECTED]> writes: > > "Jens" == Jens Petersen <[EMAIL PROTECTED]> writes: > > Jens> 16 bits is enough to describe the Basic Multilingual Plane > Jens> and I think 24 bits all the currently defined extended > Jens> planes. So I guess the report just ref

Re: Unicode support

2001-09-30 Thread Ashley Yakeley

At 2001-09-30 07:29, Marcin 'Qrczak' Kowalczyk wrote: >Some time ago the Unicode Consortium slowly began switching to the >point of view that abstract characters are denoted by numbers in the >range U+..10. It's worth mentioning that these are 'codepoints', not 'characters'. Sometimes a

Re: Unicode support

2001-09-30 Thread John Meacham

sorry for the me too post, but this has been a major pet peeve of mine for a long time. 16 bit unicode should be gotten rid of, being the worst of both worlds, non backwards compatable with ascii, endianness issues and no constant length encoding utf8 externally and utf32 when worknig with ind

Re: Unicode support

2001-09-30 Thread Marcin 'Qrczak' Kowalczyk

30 Sep 2001 14:43:21 +0100, Colin Paul Adams <[EMAIL PROTECTED]> pisze: > I think it should either be amended to mention the BMP subset of > Unicode, or, better, change the reference from 16-bit to 24-bit. 24-bit is not accurate. The range from 0 to 0x10 has 20.087462841250343 bits. There is

Re: Unicode support

2001-09-30 Thread Marcin 'Qrczak' Kowalczyk

30 Sep 2001 22:28:52 +0900, Jens Petersen <[EMAIL PROTECTED]> pisze: > 16 bits is enough to describe the Basic Multilingual Plane > and I think 24 bits all the currently defined extended > planes. So I guess the report just refers to the BMP. In early days the Unicode Consortium was doing every

Re: Unicode support

2001-09-30 Thread Colin Paul Adams

> "Jens" == Jens Petersen <[EMAIL PROTECTED]> writes: Jens> 16 bits is enough to describe the Basic Multilingual Plane Jens> and I think 24 bits all the currently defined extended Jens> planes. So I guess the report just refers to the BMP. I guess it does, and I think back in 19

Re: Unicode support

2001-09-30 Thread Jens Petersen

Hamilton Richards <[EMAIL PROTECTED]> writes: > At 12:20 PM -0500 9/29/01, Colin Paul Adams wrote: > >I have just been reading through the Haskell report to refresh my > >memory of the language. I was surprised to see this: > > > >The character type Char is an enumeration and consists of 16 bit v

Re: Unicode support

2001-09-29 Thread Hamilton Richards

At 12:20 PM -0500 9/29/01, Colin Paul Adams wrote: >I have just been reading through the Haskell report to refresh my >memory of the language. I was surprised to see this: > >The character type Char is an enumeration and consists of 16 bit values, >conforming to >the Unicode standard [10]. > >Unic

Re: Unicode

2001-05-25 Thread Marcin 'Qrczak' Kowalczyk

Sat, 26 May 2001 03:17:40 +1000, Fergus Henderson <[EMAIL PROTECTED]> pisze: > Is there a way to convert a Haskell String into a UTF-16 > encoded byte stream without writing to a file and then > reading the file back in? Sure: all conversions are available as memory to memory conversions for dir

Re: Unicode

2001-05-25 Thread John Meacham

The algorithms for encoding unicode characters into the various transport formats, UTF16,UTF8,UTF32 are well defined, they can trivially be implemented in Haskell, for instance encodeUTF8 :: String -> [Byte] decodeUTF8 :: [Byte] -> Maybe String would be easily definable. BTW, since a char is no l

Re: Unicode

2001-05-25 Thread Fergus Henderson

On 24-May-2001, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote: > Thu, 24 May 2001 14:41:21 -0700, Ashley Yakeley <[EMAIL PROTECTED]> pisze: > > >> - Initial Unicode support - the Char type is now 31 bits. > > > > It might be appropriate to have two types for Unicode, a UCS2 type > > (16

Re: Unicode

2001-05-24 Thread Marcin 'Qrczak' Kowalczyk

Thu, 24 May 2001 14:41:21 -0700, Ashley Yakeley <[EMAIL PROTECTED]> pisze: >> - Initial Unicode support - the Char type is now 31 bits. > > It might be appropriate to have two types for Unicode, a UCS2 type > (16 bits) and a UCS4 type (31 bits). Actually it's 20.087462841250343 bits. Unicode

Re: Unicode and is*

2001-05-04 Thread Marcin 'Qrczak' Kowalczyk

Fri, 4 May 2001 15:20:02 +0100, Ian Lynagh <[EMAIL PROTECTED]> pisze: > Is there a reason why isUpper and isLower include all unicode > characters of the appropriate class but isDigit is only 0..9? There are also other weirdnesses, e.g. isSpace is specified to work only on ISO-8859-1 spacing cha

RE: Unicode, emacs, xterm, (Linux)

1999-10-12 Thread Karlsson Kent - keka

This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --_=_NextPart_001_01BF14BC.3D42C660 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > But it is hard to use s

Re: Unicode (Re: Reverse composition)

1999-10-11 Thread Ralf Muschall

Lennart Augustsson wrote: > It's not hard to find a text editor, use w.g. wily. It's widely available. But it is hard to use some nonstandard (i.e. neither vi nor emacs) editor just for one special kind of source code - it means to lose all the keybindings, highlight settings, 100-lines-of-defi

Re: Unicode (Re: Reverse composition)

1999-10-11 Thread George Russell

Marcin 'Qrczak' Kowalczyk wrote: [snip] > But when Unicode finally comes... How should Haskell's textfile IO > work? I don't think the current standard functions for textfile IO would have too many problems. You can do hSeek in Haskell, but "The offset is given in terms of 8-bit bytes" (library

Re: Unicode (Re: Reverse composition)

1999-10-09 Thread Lennart Augustsson

Marcin 'Qrczak' Kowalczyk wrote: > Sat, 9 Oct 1999 12:42:20 +1300, Brian Boutel <[EMAIL PROTECTED]> pisze: > > > Be careful. '<-' is two symbols. Replacing it by one symbol can change the > > semantics of a program by affecting layout. > > No, because only the indent before the first non-whitespa

Layout (Re: Unicode)

1999-10-09 Thread Marcin 'Qrczak' Kowalczyk

Sat, 09 Oct 1999 13:08:39 +0200, Lennart Augustsson <[EMAIL PROTECTED]> pisze: > > No, because only the indent before the first non-whitespace character > > in a line matters. Haskell programs can be typeset even in proportional > > font as long as indents have correct relationships between their

37 matches

Mail list logo