haskell xml parsing for larger files?

2014-02-20 Thread Christian Maeder
Hi, I've got some difficulties parsing large xml files ( 100MB). A plain SAX parser, as provided by hexpat, is fine. However, constructing a tree consumes too much memory on a 32bit machine. see http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248 I suspect that sharing strings when

Re: haskell xml parsing for larger files?

2014-02-20 Thread Chris Smith
Have you looked at tagsoup? On Feb 20, 2014 3:30 AM, Christian Maeder christian.mae...@dfki.de wrote: Hi, I've got some difficulties parsing large xml files ( 100MB). A plain SAX parser, as provided by hexpat, is fine. However, constructing a tree consumes too much memory on a 32bit machine.

Re: haskell xml parsing for larger files?

2014-02-20 Thread Christian Maeder
I've just tried: import Text.HTML.TagSoup import Text.HTML.TagSoup.Tree main :: IO () main = getContents = putStr . renderTags . flattenTree . tagTree . parseTags which also ends with the getMBlock error. Only renderTags . parseTags works fine (like the hexpat SAX parser). Why

Re: haskell xml parsing for larger files?

2014-02-20 Thread Mathieu Boespflug
Hi Christian, as regards your question about sharing strings, there are a number of libraries on Hackage to achieve this, e.g. in the context of compiler symbols. To cite only a few: intern, stringtable-atom, simple-atom. I'm sure there are others. Best, -- Mathieu Boespflug Founder at

Re: haskell xml parsing for larger files?

2014-02-20 Thread malcolm.wallace
Is your usage pattern over the constructed tree likely to be a lazy prefix traversal? If so, then HaXml supports lazy construction of the parse tree. Some plotsappear at the end of this paper,showing how memory usage can be reduced to a constant, even for very large inputs (1 million tree

Re: haskell xml parsing for larger files?

2014-02-20 Thread Christian Maeder
I'm afraid our use case is not a lazy prefix traversal. I'm more shocked that about 100 MB xml content do not fit (as tree) into 3 GB memory. Christian Am 20.02.2014 16:49, schrieb malcolm.wallace: Is your usage pattern over the constructed tree likely to be a lazy prefix traversal? If so,

Re: haskell xml parsing for larger files?

2014-02-20 Thread Mateusz Kowalczyk
On 20/02/14 11:30, Christian Maeder wrote: Hi, I've got some difficulties parsing large xml files ( 100MB). A plain SAX parser, as provided by hexpat, is fine. However, constructing a tree consumes too much memory on a 32bit machine. see