On Sun, Mar 26, 2006 at 03:22:38PM +0400, Bulat Ziganshin wrote:
> 3. Unicode support in I/O routines, i.e. ability to read/write UTF-8
> encoded files and files what use other Unicode byte encodings: not
> implemented in any compiler, afaik, but there are 3rd-party libs:
> Streams library, New I/O
Glynn wrote (about my binary library, snipped):
> This is similar to UTF-8; however, UTF-8 is a standard format which
> can be read and written by a variety of other programs.
>
> If we want a mechanism for encoding arbitrary Haskell strings as octet
> lists, and we have a free choice as to the enc
George Russell wrote:
> > OTOH, existing implementations (at least GHC and Hugs) currently read
> > and write "8-bit binary", i.e. characters 0-255 get read and written
> > "as-is" and anything else breaks, and changing that would probably
> > break a fair amount of existing code.
>
> The bi
| there was some discussion about Unicode and the Char type
| some time ago. At the moment I'm writing some Haskell code
| dealing with XML. The problem is that there seems to be no
| consensus concerning Char so that it is difficult for me to
| deal with the XML unicode issues appropriately.
This is getting a bit off-topic for Haskell...
> Isn't it fairly common to use 32bit Unicode character types in C?
Yes, in some implementations, but nobody by a few Linux and SunOS
programmers use that... (Those systems are far from committed to
Unicode.)
In some other systems wchar_t is (exc
"Kent Karlsson" <[EMAIL PROTECTED]> writes:
> Everyone that is serious about Unicode and where efficiency
> is also of concern(!) target UTF-16 (MacOS, Windows, Epoc, Java,
> Oracle, ...).
Isn't it fairly common to use 32bit Unicode character types in C?
I'm not sure I see the efficiency gain
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Wolfgang Jeltsch
> Sent: den 5 januari 2002 13:04
> To: The Haskell Mailing List
> Subject: Unicode again
>
>
> Hello,
> there was some discussion about Unicode and the Char type
> some time ago. A
> None of that "But 21 bits *is* enough".
> Yeah, like 640K was enough. And countless other examples.
That is not comparable. Never was.
> I thought we had learned, but I was wrong... I'm especially
> disheartened to hear that ISO bought into the same crap.
Who's going to invent all these gaz
Tue, 9 Oct 2001 14:59:09 -0700, John Meacham <[EMAIL PROTECTED]> pisze:
> I think a cannonical way to get at iconvs ('man 3 iconv' for info.)
> functionality in one of the standard librarys would be great. perhaps
> I will have a go at it. even if the underlying platform does not have
> iconv the
On Tue, Oct 09, 2001 at 12:37:27PM +0200, Kent Karlsson wrote:
> > At 2001-10-09 02:58, Kent Karlsson wrote:
> > >In summary:
> > >code position (=code point): a value between and 10.
> > Would this be a reasonable basis for Haskell's 'Char' type?
>
> Yes. It's essentially UTF-32, b
On Tue, 9 Oct 2001, Ashley Yakeley wrote:
> Would it be worthwhile restricting Char to the 0-10 range, just as a
> Word8 is restricted to 0-FF even though in GHC at least it's stored
> 32-bit?
It is thus restricted in GHC. I think it's a good compromise between
32-bit-Unicode and 16-bit-U
At 2001-10-09 03:37, Kent Karlsson wrote:
>> >code position (=code point): a value between and 10.
>>
>> Would this be a reasonable basis for Haskell's 'Char' type?
>
>Yes. It's essentially UTF-32, but without the fixation to 32-bit
>(21 bits suffice). UTF-32 (a.k.a. UCS-4 in 10646,
- Original Message -
From: "Ashley Yakeley" <[EMAIL PROTECTED]>
To: "Kent Karlsson" <[EMAIL PROTECTED]>; "Haskell List" <[EMAIL PROTECTED]>;
"Libraries for Haskell List"
<[EMAIL PROTECTED]>
Sent: Tuesday, October 09, 20
At 2001-10-09 02:58, Kent Karlsson wrote:
>In summary:
>
>code position (=code point): a value between and 10.
Would this be a reasonable basis for Haskell's 'Char' type? At some point
perhaps there should be a 'Unicode' standard library for Haskell. For
instance:
encodeUTF8 :: S
Just to clear up any misunderstanding:
- Original Message -
From: "Ashley Yakeley" <[EMAIL PROTECTED]>
To: "Haskell List" <[EMAIL PROTECTED]>
Sent: Monday, October 01, 2001 12:36 AM
Subject: Re: Unicode support
> At 2001-09-30 07:29, Marcin 'Qrc
- Original Message -
From: "Dylan Thurston" <[EMAIL PROTECTED]>
To: "John Meacham" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Friday, October 05, 2001 5:47 PM
Subject: Re: Unicode support
> On Sun, Sep 30, 2001 at 11:01:38AM -0700, John Mea
- Original Message -
From: "Wolfgang Jeltsch" <[EMAIL PROTECTED]>
To: "The Haskell Mailing List" <[EMAIL PROTECTED]>
Sent: Thursday, October 04, 2001 8:47 PM
Subject: Re: Unicode support
> On Sunday, 30 September 2001 20:01, John Meacham wrote:
>
On Sun, Sep 30, 2001 at 11:01:38AM -0700, John Meacham wrote:
> seeing as how the haskell standard is horribly vauge when it comes to
> character set encodings anyway, I would recommend that we just omit any
> reference to the bit size of Char, and just say abstractly that each
> Char represents o
On Sunday, 30 September 2001 20:01, John Meacham wrote:
> sorry for the me too post, but this has been a major pet peeve of mine
> for a long time. 16 bit unicode should be gotten rid of, being the worst
> of both worlds, non backwards compatable with ascii, endianness issues
> and no constant len
Colin Paul Adams <[EMAIL PROTECTED]> writes:
> > "Jens" == Jens Petersen <[EMAIL PROTECTED]> writes:
>
> Jens> 16 bits is enough to describe the Basic Multilingual Plane
> Jens> and I think 24 bits all the currently defined extended
> Jens> planes. So I guess the report just ref
At 2001-09-30 07:29, Marcin 'Qrczak' Kowalczyk wrote:
>Some time ago the Unicode Consortium slowly began switching to the
>point of view that abstract characters are denoted by numbers in the
>range U+..10.
It's worth mentioning that these are 'codepoints', not 'characters'.
Sometimes a
sorry for the me too post, but this has been a major pet peeve of mine
for a long time. 16 bit unicode should be gotten rid of, being the worst
of both worlds, non backwards compatable with ascii, endianness issues
and no constant length encoding utf8 externally and utf32 when
worknig with ind
30 Sep 2001 14:43:21 +0100, Colin Paul Adams <[EMAIL PROTECTED]> pisze:
> I think it should either be amended to mention the BMP subset of
> Unicode, or, better, change the reference from 16-bit to 24-bit.
24-bit is not accurate. The range from 0 to 0x10 has
20.087462841250343 bits. There is
30 Sep 2001 22:28:52 +0900, Jens Petersen <[EMAIL PROTECTED]> pisze:
> 16 bits is enough to describe the Basic Multilingual Plane
> and I think 24 bits all the currently defined extended
> planes. So I guess the report just refers to the BMP.
In early days the Unicode Consortium was doing every
> "Jens" == Jens Petersen <[EMAIL PROTECTED]> writes:
Jens> 16 bits is enough to describe the Basic Multilingual Plane
Jens> and I think 24 bits all the currently defined extended
Jens> planes. So I guess the report just refers to the BMP.
I guess it does, and I think back in 19
Hamilton Richards <[EMAIL PROTECTED]> writes:
> At 12:20 PM -0500 9/29/01, Colin Paul Adams wrote:
> >I have just been reading through the Haskell report to refresh my
> >memory of the language. I was surprised to see this:
> >
> >The character type Char is an enumeration and consists of 16 bit v
At 12:20 PM -0500 9/29/01, Colin Paul Adams wrote:
>I have just been reading through the Haskell report to refresh my
>memory of the language. I was surprised to see this:
>
>The character type Char is an enumeration and consists of 16 bit values,
>conforming to
>the Unicode standard [10].
>
>Unic
Sat, 26 May 2001 03:17:40 +1000, Fergus Henderson <[EMAIL PROTECTED]> pisze:
> Is there a way to convert a Haskell String into a UTF-16
> encoded byte stream without writing to a file and then
> reading the file back in?
Sure: all conversions are available as memory to memory conversions
for dir
The algorithms for encoding unicode characters into the various
transport formats, UTF16,UTF8,UTF32 are well defined, they can trivially
be implemented in Haskell, for instance
encodeUTF8 :: String -> [Byte]
decodeUTF8 :: [Byte] -> Maybe String
would be easily definable.
BTW, since a char is no l
On 24-May-2001, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote:
> Thu, 24 May 2001 14:41:21 -0700, Ashley Yakeley <[EMAIL PROTECTED]> pisze:
>
> >> - Initial Unicode support - the Char type is now 31 bits.
> >
> > It might be appropriate to have two types for Unicode, a UCS2 type
> > (16
Thu, 24 May 2001 14:41:21 -0700, Ashley Yakeley <[EMAIL PROTECTED]> pisze:
>> - Initial Unicode support - the Char type is now 31 bits.
>
> It might be appropriate to have two types for Unicode, a UCS2 type
> (16 bits) and a UCS4 type (31 bits).
Actually it's 20.087462841250343 bits. Unicode
Fri, 4 May 2001 15:20:02 +0100, Ian Lynagh <[EMAIL PROTECTED]> pisze:
> Is there a reason why isUpper and isLower include all unicode
> characters of the appropriate class but isDigit is only 0..9?
There are also other weirdnesses, e.g. isSpace is specified to work
only on ISO-8859-1 spacing cha
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
--_=_NextPart_001_01BF14BC.3D42C660
Content-Type: text/plain;
charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
> But it is hard to use s
Lennart Augustsson wrote:
> It's not hard to find a text editor, use w.g. wily. It's widely available.
But it is hard to use some nonstandard (i.e. neither vi nor emacs)
editor just for one special kind of source code - it means to lose
all the keybindings, highlight settings, 100-lines-of-defi
Marcin 'Qrczak' Kowalczyk wrote:
[snip]
> But when Unicode finally comes... How should Haskell's textfile IO
> work?
I don't think the current standard functions for textfile IO would
have too many problems. You can do hSeek in Haskell, but
"The offset is given in terms of 8-bit bytes" (library
Marcin 'Qrczak' Kowalczyk wrote:
> Sat, 9 Oct 1999 12:42:20 +1300, Brian Boutel <[EMAIL PROTECTED]> pisze:
>
> > Be careful. '<-' is two symbols. Replacing it by one symbol can change the
> > semantics of a program by affecting layout.
>
> No, because only the indent before the first non-whitespa
Sat, 09 Oct 1999 13:08:39 +0200, Lennart Augustsson <[EMAIL PROTECTED]> pisze:
> > No, because only the indent before the first non-whitespace character
> > in a line matters. Haskell programs can be typeset even in proportional
> > font as long as indents have correct relationships between their
37 matches
Mail list logo