On 20040426T104946-0700, David Brown wrote:
> Is anyone aware of any Haskell libraries for doing UTF-8 decoding and
> encoding? If not, I'll write something simple.
I wrote a simple Unicode library for my MSc project a couple of
years ago. It might not compile with recent GHC, but you can have a
David Brown wrote (snipped):
What license is your code covered under? As it stands now, it is an
informative example, but cannot be used by anybody.
As author, I am quite happy for it to be used and modified by other people
for non-commercial purposes. As far as I know my employers wouldn't
any p
On Tue, Apr 27, 2004 at 10:55:57AM +0200, George Russell wrote:
> I have implemented UTF8-encode/decode. Unlike the code someone has already
> posted it handles all UTF8 sequences, including those longer than 3 bytes.
> It also catches all illegal UTF8 sequences (such as characters encoded
> with
I have implemented UTF8-encode/decode. Unlike the code someone has already
posted it handles all UTF8 sequences, including those longer than 3 bytes.
It also catches all illegal UTF8 sequences (such as characters encoded
with a longer sequence than necessary). Here is the code.
--
On Mon, Apr 26, 2004 at 08:33:38PM +0200, Sven Panne wrote:
> Duncan Coutts wrote:
> >On Mon, 2004-04-26 at 18:49, David Brown wrote: [...]
> >toUTF :: String -> String
>
> Hmmm, "String -> [Word8]" would be nicer...
>
> >fromUTF :: String -> String
>
> ... and here: "[Word8] -> String" or "[Wor
Duncan Coutts wrote:
On Mon, 2004-04-26 at 18:49, David Brown wrote: [...]
toUTF :: String -> String
Hmmm, "String -> [Word8]" would be nicer...
fromUTF :: String -> String
... and here: "[Word8] -> String" or "[Word8] -> Maybe String".
Furthermore, UTF-8 is not restricted to a maximum of 3 bytes
On Mon, 2004-04-26 at 18:49, David Brown wrote:
> Is anyone aware of any Haskell libraries for doing UTF-8 decoding and
> encoding? If not, I'll write something simple.
The gtk2hs library uses the following functions internally.
Credit to Axel Simon I believe unless he swiped them from somewhere
I am writing some utilities to deal with UTF-8 encoded text files (not
source). Currently, I'm just reading in the UTF-8 directly, and things
work reasonably well, since my parse tokens are ASCII, they are easy to
parse.
However, the character type seems perfectly happy with larger values for
eac