On Sat, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov
mikhail.vorozht...@gmail.com wrote:
Hello lists,
As some of you may know, GHC's support for Unicode characters in lexemes
is
rather crude and hence prone to inconsistencies in their handling versus
the
ASCII counterparts. For example
, Jun 14, 2014 at 7:48 AM, Mikhail Vorozhtsov
mikhail.vorozht...@gmail.com wrote:
Hello lists,
As some of you may know, GHC's support for Unicode characters in lexemes is
rather crude and hence prone to inconsistencies in their handling versus the
ASCII counterparts. For example, APOSTROPHE is treated
On 06/17/2014 03:13 AM, Tsuyoshi Ito wrote:
Hello,
Mikhail Vorozhtsov mikhail.vorozht...@gmail.com wrote:
I also worry (although not based on anything particular you said)
whether this will not change meaning of any existing programs. Does it
only allow new programs?
As far as I can see, no
and doesn't entail CPP concerns.
John
On Sun, Jun 15, 2014 at 5:26 PM, Mateusz Kowalczyk
fuuze...@fuuzetsu.co.uk wrote:
On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote:
Hello lists,
As some of you may know, GHC's support for Unicode characters in lexemes
is rather crude and hence prone
On 2014-06-16 at 02:26:49 +0200, Mateusz Kowalczyk wrote:
[...]
While personally I like the proposal (wanted prime and sub/sup scripts
way too many times), I worry what this means for compatibility reasons:
suddenly we'll have code that fails to build on 7.8 and before because
someone using
On 06/16/2014 04:26 AM, Mateusz Kowalczyk wrote:
On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote:
Hello lists,
As some of you may know, GHC's support for Unicode characters in lexemes
is rather crude and hence prone to inconsistencies in their handling
versus the ASCII counterparts
Hello,
Mikhail Vorozhtsov mikhail.vorozht...@gmail.com wrote:
I also worry (although not based on anything particular you said)
whether this will not change meaning of any existing programs. Does it
only allow new programs?
As far as I can see, no change in meaning. Some hacky operators and
On 06/14/2014 04:48 PM, Mikhail Vorozhtsov wrote:
Hello lists,
As some of you may know, GHC's support for Unicode characters in lexemes
is rather crude and hence prone to inconsistencies in their handling
versus the ASCII counterparts. For example, APOSTROPHE is treated
differently from
Hello lists,
As some of you may know, GHC's support for Unicode characters in lexemes
is rather crude and hence prone to inconsistencies in their handling
versus the ASCII counterparts. For example, APOSTROPHE is treated
differently from PRIME:
λ data a +' b = Plus a b
interactive:3:9
lists,
As some of you may know, GHC's support for Unicode characters in lexemes is
rather crude and hence prone to inconsistencies in their handling versus the
ASCII counterparts. For example, APOSTROPHE is treated differently from
PRIME:
λ data a +' b = Plus a b
interactive:3:9:
Unexpected
Hi
This is some file äöü.hs with three German umlauts in the file name:
main = putStrLn äöü
Now I want to get the dependendency information. Therefore I call:
ghc -M äöü.hs
The following gets added to the Makefile:
# DO NOT DELETE: Beginning of Haskell dependencies
äöü.o :
On Tue, Mar 13, 2012 at 06:06:49PM +0100, Volker Wysk wrote:
I'm sending this to glasgow-haskell-users instead of glasgow-haskell-bugs,
because the latter does not seem to accept my messages. I receive nothing,
neither the message in the mailing list, nor any error message.
As I understand
replacing the POSIX layer isn't necessary to fix the Unicode
console output bug. I've made a ticket and in a comment I illustrate the
_setmode call that magically makes everything work:
http://hackage.haskell.org/trac/ghc/ticket/4471
I could attempt a ghc patch for this, but I don't have any
can make a small C
test case and send it to the Microsoft people. Some[1] are reporting success
with Unicode console output.
David
[1] http://www.codeproject.com/KB/cpp/unicode_console_output.aspx
On Tue, Nov 2, 2010 at 3:49 AM, Krasimir Angelov kr.ange...@gmail.com
wrote
On 2 November 2010 21:05, David Sankel cam...@gmail.com wrote:
Is there a ghc wontfix bug ticket for this? Perhaps we can make a small C
test case and send it to the Microsoft people. Some[1] are reporting success
with Unicode console output.
I confirmed that I can output Chinese unicode from
Hello Max,
Wednesday, November 3, 2010, 1:26:50 PM, you wrote:
1. You need to use chcp 65001 to set the console code page to UTF8
2. It is very likely that your Windows console won't have the fonts
required to actually make sense of the output. Pipe the output to
foo.txt. If you open this
On Wed, Nov 3, 2010 at 9:00 AM, Simon Marlow marlo...@gmail.com wrote:
On 03/11/2010 10:36, Bulat Ziganshin wrote:
Hello Max,
Wednesday, November 3, 2010, 1:26:50 PM, you wrote:
1. You need to use chcp 65001 to set the console code page to UTF8
2. It is very likely that your Windows
This is evidence for the broken Unicode support in the Windows
terminal and not a problem with GHC. I experienced the same many
times.
2010/11/2 David Sankel cam...@gmail.com:
On Mon, Nov 1, 2010 at 10:20 PM, David Sankel cam...@gmail.com wrote:
Hello all,
I'm attempting to output some
Is there a ghc wontfix bug ticket for this? Perhaps we can make a small C
test case and send it to the Microsoft people. Some[1] are reporting success
with Unicode console output.
David
[1] http://www.codeproject.com/KB/cpp/unicode_console_output.aspx
On Tue, Nov 2, 2010 at 3:49 AM, Krasimir
Hello all,
I'm attempting to output some Unicode on the windows console. I set my
windows console code page to utf-8 using chcp 65001.
The program:
-- Test.hs
main = putStr λ.x→x
The output of `runghc Test.hs`:
λ.x→
From within ghci, typing `main`:
λ*** Exception: stdout: hPutChar
On Mon, Nov 1, 2010 at 10:20 PM, David Sankel cam...@gmail.com wrote:
Hello all,
I'm attempting to output some Unicode on the windows console. I set my
windows console code page to utf-8 using chcp 65001.
The program:
-- Test.hs
main = putStr λ.x→x
The output of `runghc Test.hs
correctly, operators are named by (symbol {symbol | : }), where symbol is either an ascii symbol (including *) or a unicode symbol (defined as any Unicode symbol or punctuation). I'm pretty sure º is a unicode symbol or punctuation.I know I could get around this by using a different name
On Saturday 11 September 2010 03:12:11, Greg wrote:
If I read the Haskell Report correctly, operators are named by (symbol
{symbol | : }), where symbol is either an ascii symbol (including *) or
a unicode symbol (defined as any Unicode symbol or punctuation). I'm
pretty sure º is a unicode
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 9/10/10 21:39 , Daniel Fischer wrote:
On Saturday 11 September 2010 03:12:11, Greg wrote:
a unicode symbol (defined as any Unicode symbol or punctuation). I'm
pretty sure º is a unicode symbol or punctuation.
Prelude Data.Char
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 9/10/10 21:12 , Greg wrote:
unicode symbol (defined as any Unicode symbol or punctuation). I'm pretty
sure º is a unicode symbol or punctuation.
No, it's a raised lowercase o used by convention to indicate gender of
abbreviated ordinals. You
--GregOn Sep 10, 2010, at 06:49 PM, Brandon S Allbery KF8NH allb...@ece.cmu.edu wrote:-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 9/10/10 21:12 , Greg wrote:
unicode symbol (defined as any Unicode symbol or punctuation). I'm pretty
sure º is a unicode symbol or punctuation.
No, it's a raised
On Wed, Apr 21, 2010 at 12:51 AM, Yitzchak Gale g...@sefer.org wrote:
Yes, sorry. Either use TWO DOT LEADER, or remove
this Unicode alternative altogether
(i.e. leave it the way it is *without* the UnicodeSyntax extension).
I'm happy with either of those. I just don't like moving the dots
up
I wrote:
My opinion is that we should either use TWO DOT LEADER,
or just leave it as it is now, two FULL STOP characters.
Simon Marlow wrote:
Just to be clear, you're suggesting *removing* the Unicode alternative for
'..' from GHC's UnicodeSyntax extension?
Yes, sorry. Either use TWO DOT
On 15/04/2010 18:12, Yitzchak Gale wrote:
My opinion is that we should either use TWO DOT LEADER,
or just leave it as it is now, two FULL STOP characters.
Just to be clear, you're suggesting *removing* the Unicode alternative
for '..' from GHC's UnicodeSyntax extension?
I have no strong
I think the baseline ellipsis makes much more sense; it's
hard to see how the midline ellipsis was chosen.
--
Jason Dusek
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
My opinion is that we should either use TWO DOT LEADER,
or just leave it as it is now, two FULL STOP characters.
Two dots indicating a range is not the same symbol
as a three dot ellipsis.
Traditional non-Unicode Haskell will continue to be
around for a long time to come. It would be very
That is very interesting. I didn't know the history of those characters.
If we can't find a Unicode character that everyone agrees upon,
I also don't see any problem with leaving it as two FULL STOP
characters.
I agree. I don't like the current Unicode variant for .., therefore
I suggested
compatibility (even though it is a really small
change).
Regards,
Roel van Dijk
1 -
http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#unicode-syntax
2 - http://en.wikipedia.org/wiki/Ellipsis#In_mathematical_notation
3 - http://hackage.haskell.org/trac/ghc/ticket/3894
Max Vasin wrote:
Wouldn't it be more correct to separate binary IO, which
return [Word8] (or ByteString) and text IO which return
[Char] and deal with text encoding? IIRC that was done in
Bulat Ziganshin's streams library.
That's exactly what I meant.
Text IO could be then implemented on
Simon Marlow wrote:
The only change to the existing behaviour is that by default, text IO
is done in the prevailing encoding of the system. Handles created by
openBinaryFile use the Latin-1 encoding, as do Handles placed in
binary mode using hSetBinaryMode.
wouldn't be semantically correct
I've been working on adding proper Unicode support to Handle I/O in GHC,
and I finally have something that's ready for testing. I've put a patchset
here:
http://www.haskell.org/~simonmar/base-unicode.tar.gz
That is a set of patches against a GHC repo tree: unpack the tarball, and
say 'sh
Simon Marlow wrote:
I've been working on adding proper Unicode support to Handle I/O in GHC,
and I finally have something that's ready for testing. I've put a patchset
here:
Yay!
Comments below.
Comments/discussion please!
Do you expect Hugs will be able to pick up all
On Tue, Feb 03, 2009 at 10:56:13PM +, Duncan Coutts wrote:
Thanks to suggestions from Duncan Coutts, it's possible to call
hSetEncoding even on buffered read Handles, and the right thing
happens. So we can read from text streams that include multiple
encodings, such as an HTTP
On Tue, 2009-02-03 at 11:03 -0600, John Goerzen wrote:
Will there also be something to handle the UTF-16 BOM marker? I'm not
sure what the best API for that is, since it may or may not be present,
but it should be considered -- and could perhaps help autodetect encoding.
I think someone else
On Tue, 2009-02-03 at 17:39 -0600, John Goerzen wrote:
On Tue, Feb 03, 2009 at 10:56:13PM +, Duncan Coutts wrote:
Thanks to suggestions from Duncan Coutts, it's possible to call
hSetEncoding even on buffered read Handles, and the right thing
happens. So we can read from text
Duncan Coutts wrote:
Sorry, I think we've been talking at cross purposes.
I think so.
There always has to be *some* conversion from a 32-bit Char to the
system's selection, right?
Yes. In text mode there is always some conversion going on. Internally
there is a byte buffer and a char
Hi all,
We've been weighing up the options to solve the recent problems that
editline has given us, and we think that this is the best way forward:
For 6.12:
* http://hackage.haskell.org/trac/ghc/ticket/2811
Implement unicode support for text I/O
(we've had this on the TODO list for some
igloo:
Hi all,
We've been weighing up the options to solve the recent problems that
editline has given us, and we think that this is the best way forward:
For 6.12:
* http://hackage.haskell.org/trac/ghc/ticket/2811
Implement unicode support for text I/O
(we've had
On Tue, Nov 25, 2008 at 01:28:48PM -0800, Donald Bruce Stewart wrote:
Can we construct a set of tests that determines if a given line editing
code base works to our satisfaction?
If you can make some tests then that would be great. You need to be
careful though, e.g. input had better look
Hello Bulat,
Thursday, November 24, 2005, 4:17:24 AM, you wrote:
BZ but i propose to make these middle-level functions after stage 2 or
BZ even 3 in this scheme - so that they will be fully in Haskell world,
BZ only work with file descriptors instead of Handles. for example:
it's better one
with files with Unicode
filenames, nor it can tell/seek in files for positions larger than 4
GB. it is because Unix-compatible functions open/fstat/tell/... that
is supported in Mingw32 works only with char[] for filenames and
off_t (which is 32 bit) for file sizes/positions
half year ago i discussed
as a big deal.
SM It's more important to organise the codebase and make sure all the
SM #ifdefs are behind suitable abstractions.
so i will write the following:
-- Support for Unicode filenames and files4GB
#ifdef mingw32_HOST_OS
in ALL the places where this feature test must take place
Am Montag, 21. November 2005 13:01 schrieb Bulat Ziganshin:
[...]
#ifdef mingw32_HOST_OS
type CFilePath = LPCTSTR
type CFileOffset = Int64
withCFilePath = withTString
peekCFilePath = peekTString
#else
type CFilePath = CString
type CFileOffset = COff
withCFilePath = withCString
Hello Sven,
Tuesday, November 22, 2005, 8:53:55 PM, you wrote:
#ifdef mingw32_HOST_OS
type CFilePath = LPCTSTR
type CFileOffset = Int64
SP Whatever will be done, please use *feature-based* ifdefs, not those
SP platform-dependent ones above, which will be proven wrong either
immediately
SP
Hello glasgow-haskell-users,
Simon, what you will say about the following plan?
ghc/win32 currently don't support operations with files with Unicode
filenames, nor it can tell/seek in files for positions larger than 4
GB. it is because Unix-compatible functions open/fstat/tell
Hello Simon,
Tuesday, May 17, 2005, 5:30:06 PM, you wrote:
The question is what Alex should see for a unicode character: Alex
currently assumes that characters are in the range 0-255 (you need a
fixed range in order to generate the lexer tables). One possibility
is to map all Unicode upper
Hello
it is true what to support unicode source files only StringBuffer
implementation must be changed? if so, then task can be simplified by
converting any files read by hGetStringBuffer to UTF-32 (PackedString)
representation and putting in memory array in this form. After this,
we must change
On 19 January 2005 05:31, John Meacham wrote:
A while ago I wrote a glibc specific implementation of the CWString
library. I have since made several improvements:
* No longer glibc specific, should compile and work on any system with
iconv (which is unix standard) (but there are still
On 14 January 2005 12:58, Dimitry Golubovsky wrote:
Now I need more advice on which flavor of Unicode support to
implement. In Haskell-cafe, there were 3 flavors summarized: I am
reposting the table here (its latest version).
|Sebastien's| Marcin's | Hugs
are basically int - int, it
does not affect the result.
The code I use is some draft code, based on what I submitted for Hugs
(pure Unicode basically, even without extra space characters).
Now I need more advice on which flavor of Unicode support to
implement. In Haskell-cafe, there were 3 flavors
On 11 January 2005 02:29, Dimitry Golubovsky wrote:
Bad thing is, LD_PRELOAD does not work on all systems. So I tried to
put the code directly into the runtime (where I believe it should be;
the Unicode properties table is packed, and won't eat much space). I
renamed foreign function names
Hi,
Following up the discussion in Haskell-Cafe about ways to bring better
Unicode support in GHC.
I may take care on putting this into the GHC runtime, but I need some
advice as I am completely new to this.
What needs to be done primarily, is to replace the FFI calls made from
GHC.Unicode
this is
the best way to go about it.
Sure, you can run Alex over the UTF-8 source, but the grammar will be huge. A simpler
way is to take advantage of the fact that Haskell only uses 5 classes of Unicode
characters: uniSmall, uniLarge, uniWhite, uniSymbol, and uniDigit. Alex has a good
input abstraction
On Fri, Dec 19, 2003 at 12:17:42PM -0800, John Meacham wrote:
1. written the CWString library (now a part of the FFI) which lets you
call arbitrary C functions doing all the proper character set conversion
stuff.
Do you plan to update this and merge it with the hierarchical libraries
to
Whilst I appreciate the topic of show is not directly related to GHC,
what I would like to know is how to handle UNICODE properly... If I assume
I have a good unicode terminal, so stdin and stdout are in unicode format,
and all my text files are in unicode, how do I deal with this properly in
GHC
On Fri, Dec 19, 2003 at 04:51:50PM +, MR K P SCHUPKE wrote:
Whilst I appreciate the topic of show is not directly related to GHC,
what I would like to know is how to handle UNICODE properly... If I assume
I have a good unicode terminal, so stdin and stdout are in unicode format,
and all my
Dylan Thurston [EMAIL PROTECTED] writes:
Right. In Unicode, the concept of a character is not really so
useful;
After reading a bit about it, I'm certainly confused.
Unicode/ISO-10646 contains a lot of things that aren'r really one
character, e.g. ligatures.
most functions
- Original Message -
From: Ketil Malde [EMAIL PROTECTED]
To: Dylan Thurston [EMAIL PROTECTED]
Cc: Andrew J Bromage [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Monday, October 08, 2001 9:02 AM
Subject: Re: UniCode
(The spelling is 'Unicode' (and none other).)
Dylan
G'day all.
On Fri, Oct 05, 2001 at 06:17:26PM +, Marcin 'Qrczak' Kowalczyk wrote:
This information is out of date. AFAIR about 4 of them is assigned.
Most for Chinese (current, not historic).
I wasn't aware of this. Last time I looked was Unicode 3.0. Thanks
for the update
Why Char is 32 bit. UniCode characters is 16 bit.
__
Do You Yahoo!?
NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1
___
Glasgow-haskell
Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] writes:
Fri, 5 Oct 2001 02:29:51 -0700 (PDT), Krasimir Angelov [EMAIL PROTECTED] pisze:
Why Char is 32 bit. UniCode characters is 16 bit.
No, Unicode characters have 21 bits (range U+..10).
We've been through all this, of course
G'day all.
On Fri, Oct 05, 2001 at 02:29:51AM -0700, Krasimir Angelov wrote:
Why Char is 32 bit. UniCode characters is 16 bit.
It's not quite as simple as that. There is a set of one million
(more correctly, 1M) Unicode characters which are only accessible
using surrogate pairs (i.e. two UTF
Fri, 5 Oct 2001 23:23:50 +1000, Andrew J Bromage [EMAIL PROTECTED] pisze:
There is a set of one million (more correctly, 1M) Unicode characters
which are only accessible using surrogate pairs (i.e. two UTF-16
codes). There are currently none of these codes assigned,
This information is out
05 Oct 2001 14:35:17 +0200, Ketil Malde [EMAIL PROTECTED] pisze:
Does Haskell's support of Unicode mean UTF-32, or full UCS-4?
It's not decided officially. GHC uses UTF-32. It's expected that
UCS-4 will vanish and ISO-10646 will be reduced to the same range
U+..10 as Unicode
Hi all!
The question is really simple: how can I convert an Int into a Char?
Ghc 5.00.2 provides (initial) Unicode support, so I thought the chr function
would do. But it seems it still rejects Int values greater than 0xFF. So,
what function shoud I use?
Thanks in advance.
Regards,
Pablo
Tue, 11 Sep 2001 13:19:54 -0300 (GMT), Pablo Pedemonte [EMAIL PROTECTED]
pisze:
Ghc 5.00.2 provides (initial) Unicode support, so I thought the
chr function would do. But it seems it still rejects Int values
greater than 0xFF.
It doesn't.
--
__( Marcin Kowalczyk * [EMAIL PROTECTED] http
is the official identifier, it is
rather bad form to write a person's name in Kana (the
phonetic alphabets).
You're absolutely right. This fact slipped my mind.
Still, probably 85% (just a guess) of Japanese names can be written with
Jyouyou kanji, and the CJK set in Unicode is a strict superset
Marcin 'Qrczak' Kowalczyk wrote:
As for the language standard: I hope that Char will be allowed or
required to have =30 bits instead of current 16; but never more than
Int, to be able to use ord and chr safely.
Er does it have to? The Java Virtual Machine implements Unicode with
16 bits. (OK
OTOH, it wouldn't be hard to change GHC's Char datatype to be a
full 32-bit integral data type.
Could we do it please?
It will not break anything if done slowly. I imagine that
{read,write}CharOffAddr and _ccall_ will still use only 8 bits of
Char. But after Char is wide, libraries
implements Unicode with
16 bits. (OK, so I suppose that means it can't cope with Korean or Chinese.)
Just to set the record straight:
Many CJK (Chinese-Japanese-Korean) characters are encodable in 16 bits. I am
not so familiar with the Chinese or Korean situations, but in Japan
Virtual Machine implements Unicode with
16 bits. (OK, so I suppose that means it can't cope with Korean or Chinese.)
So requiring Char to be =30 bits would stop anyone implementing a
conformant Haskell on the JVM.
OK, "allowed", not "required"; currently it is not even allowed
Tue, 16 May 2000 12:26:12 +0200 (MET DST), Frank Atanassow [EMAIL PROTECTED] pisze:
Of course, you can always come up with specialized schemes involving stateful
encodings and/or "block-swapping" (using the Unicode private-use areas, for
example), but then, that subverts t
does it have to? The Java Virtual Machine implements Unicode with
16 bits. (OK, so I suppose that means it can't cope
with Korean or Chinese.)
Just to set the record straight:
Many CJK (Chinese-Japanese-Korean) characters are
encodable in 16 bits. I am not so familiar
How safe is representinging Unicode characters as Chars unsafeCoerce#d
from large Ints? Seems to work in simple cases :-)
--
__("Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
\__/ GCS/M d- s+:-- a23 C+++$ UL++$ P+++ L++$ E-
^^
What is the status of the lastest release (3.01) with respect to Unicode
support? Is it possible to write source in Unicode? How wide are
characters? Do the I/O library functions support it? etc.
--FC
80 matches
Mail list logo