Rene de Visser wrote:
I think a step towards support medium size documents in HXT would be to
store the tags and content more efficiently.
If I undertand the coding correctly every tag is stored as a seperate
Haskell string. As each byte of a string under GHC takes 12 bytes this alone
Yitzchak Gale [EMAIL PROTECTED] wrote:
Another question about HaXML and HXT -
what is the level of XML spec. compliance?
In HaXml, I certainly tried pretty hard to match the (draft) XML 1.0
spec, since the library was originally developed for a commercial
entity. But that was back in 1999,
Yitzchak Gale [EMAIL PROTECTED] wrote:
Henning Thielemann wrote:
HXT uses Parsec, which is strict.
Is is strict to the extent that it cannot produce any
output at all until it has read the entire XML document?
That would make HXT (and Parsec, for that matter)
useless for a large
Malcolm Wallace wrote:
HaXml now uses the polyparse library, and you can choose whether you
want well-formedness checking with the original strict parser, or lazy
space-efficient on-demand parsing. Initial performance results show
that parsing XML lazily is always better than 2x as fast, and
Another question about HaXML and HXT -
what is the level of XML spec. compliance?
The many who have tried to implement compliant
XML parsers in various languages - and the few
who have succeeded - all agree that this is much
harder than it seems at first.
Most of the time, the final result is an
Malcolm Wallace wrote:
I have been considering moving the lexer to use
ByteString instead of String, which would neatly solve that problem too.
Doesn't the lexer come only after decoding?
Then you have Unicode. Does ByteString still help?
-Yitz
___
Yitzchak Gale wrote:
Another question about HaXML and HXT -
what is the level of XML spec. compliance?
Implementing the XML 1.0 Standard was
one of the goals of HXT when starting the project.
This includes full support of DTD processing,
which turned out to be the hardest part of the
whole
Ketil Malde [EMAIL PROTECTED] writes:
HaXml on my list after TagSoup, which I'm about to get to work, I
think (got distracted a bit ATM).
As it is, I managed to parse my document using TagSoup. One major
obstacle was the need to process a sizeable partition of the file.
Using 'partitions'
Hi,
I'm struggling to get my HXT-based parser to parse a largish file
(300MB), even after breaking into reasonably-sized chunks. The
culprit appears to be parsing one element comprising 25K lines of
text, which apparently requires more memory than the 2Gb my computer
is equipped with.
I'm
Hi Ketil,
I'm struggling to get my HXT-based parser to parse a largish file
(300MB), even after breaking into reasonably-sized chunks. The
culprit appears to be parsing one element comprising 25K lines of
text, which apparently requires more memory than the 2Gb my computer
is equipped with.
On Mon, 22 Oct 2007, Ketil Malde wrote:
I'm wondering what approach others use for non-toy XML data. Is the
problem due to some error I have made, or should I just ignore the
XML, and just parse it manually by dissecting bytestrings, or will
another XML library serve better?
HXT uses
Henning Thielemann wrote:
HXT uses Parsec, which is strict.
Is is strict to the extent that it cannot produce any
output at all until it has read the entire XML document?
That would make HXT (and Parsec, for that matter)
useless for a large percentage of tasks.
Or is it just too strict in
12 matches
Mail list logo