Re: [Haskell-cafe] XML parser recommendation?

2007-10-24 Thread Uwe Schmidt
Rene de Visser wrote: I think a step towards support medium size documents in HXT would be to store the tags and content more efficiently. If I undertand the coding correctly every tag is stored as a seperate Haskell string. As each byte of a string under GHC takes 12 bytes this alone

Re: [Haskell-cafe] XML parser recommendation?

2007-10-24 Thread Malcolm Wallace
Yitzchak Gale [EMAIL PROTECTED] wrote: Another question about HaXML and HXT - what is the level of XML spec. compliance? In HaXml, I certainly tried pretty hard to match the (draft) XML 1.0 spec, since the library was originally developed for a commercial entity. But that was back in 1999,

Re: [Haskell-cafe] XML parser recommendation?

2007-10-23 Thread Malcolm Wallace
Yitzchak Gale [EMAIL PROTECTED] wrote: Henning Thielemann wrote: HXT uses Parsec, which is strict. Is is strict to the extent that it cannot produce any output at all until it has read the entire XML document? That would make HXT (and Parsec, for that matter) useless for a large

Re: [Haskell-cafe] XML parser recommendation?

2007-10-23 Thread Yitzchak Gale
Malcolm Wallace wrote: HaXml now uses the polyparse library, and you can choose whether you want well-formedness checking with the original strict parser, or lazy space-efficient on-demand parsing. Initial performance results show that parsing XML lazily is always better than 2x as fast, and

Re: [Haskell-cafe] XML parser recommendation?

2007-10-23 Thread Yitzchak Gale
Another question about HaXML and HXT - what is the level of XML spec. compliance? The many who have tried to implement compliant XML parsers in various languages - and the few who have succeeded - all agree that this is much harder than it seems at first. Most of the time, the final result is an

Re: [Haskell-cafe] XML parser recommendation?

2007-10-23 Thread Yitzchak Gale
Malcolm Wallace wrote: I have been considering moving the lexer to use ByteString instead of String, which would neatly solve that problem too. Doesn't the lexer come only after decoding? Then you have Unicode. Does ByteString still help? -Yitz ___

Re: [Haskell-cafe] XML parser recommendation?

2007-10-23 Thread Uwe Schmidt
Yitzchak Gale wrote: Another question about HaXML and HXT - what is the level of XML spec. compliance? Implementing the XML 1.0 Standard was one of the goals of HXT when starting the project. This includes full support of DTD processing, which turned out to be the hardest part of the whole

Re: [Haskell-cafe] XML parser recommendation?

2007-10-23 Thread Ketil Malde
Ketil Malde [EMAIL PROTECTED] writes: HaXml on my list after TagSoup, which I'm about to get to work, I think (got distracted a bit ATM). As it is, I managed to parse my document using TagSoup. One major obstacle was the need to process a sizeable partition of the file. Using 'partitions'

[Haskell-cafe] XML parser recommendation?

2007-10-22 Thread Ketil Malde
Hi, I'm struggling to get my HXT-based parser to parse a largish file (300MB), even after breaking into reasonably-sized chunks. The culprit appears to be parsing one element comprising 25K lines of text, which apparently requires more memory than the 2Gb my computer is equipped with. I'm

Re: [Haskell-cafe] XML parser recommendation?

2007-10-22 Thread Neil Mitchell
Hi Ketil, I'm struggling to get my HXT-based parser to parse a largish file (300MB), even after breaking into reasonably-sized chunks. The culprit appears to be parsing one element comprising 25K lines of text, which apparently requires more memory than the 2Gb my computer is equipped with.

Re: [Haskell-cafe] XML parser recommendation?

2007-10-22 Thread Henning Thielemann
On Mon, 22 Oct 2007, Ketil Malde wrote: I'm wondering what approach others use for non-toy XML data. Is the problem due to some error I have made, or should I just ignore the XML, and just parse it manually by dissecting bytestrings, or will another XML library serve better? HXT uses

Re: [Haskell-cafe] XML parser recommendation?

2007-10-22 Thread Yitzchak Gale
Henning Thielemann wrote: HXT uses Parsec, which is strict. Is is strict to the extent that it cannot produce any output at all until it has read the entire XML document? That would make HXT (and Parsec, for that matter) useless for a large percentage of tasks. Or is it just too strict in