Re: haskell xml parsing for larger files?

2014-02-20 Thread Mateusz Kowalczyk
On 20/02/14 11:30, Christian Maeder wrote: > Hi, > > I've got some difficulties parsing "large" xml files (> 100MB). > A plain SAX parser, as provided by hexpat, is fine. However, > constructing a tree consumes too much memory on a 32bit machine. > > see http://trac.informatik.uni-bremen.de:8080

RE: [Typeable2] Re: ANNOUNCE: GHC 7.8.1 Release Candidate 1

2014-02-20 Thread Simon Peyton Jones
I think you need to import Data.OldTypeable to get Typeable2 and friends. Pedro may remember more of the reasoning about backward compatibility. | -Original Message- | From: Christian Maeder [mailto:christian.mae...@dfki.de] | Sent: 20 February 2014 16:09 | To: Simon Peyton Jones; Daniil

Re: [Typeable2] Re: ANNOUNCE: GHC 7.8.1 Release Candidate 1

2014-02-20 Thread Christian Maeder
With "TypeableN " I mean, Typeable1, Typeable2, etc. Typeable2 was not supported (below) by ghc-7.8-rc1. Where is the "backward compat"? In fact, Typeable and Typeable2 mean the same thing for ghc-7.8-rc1 and ghc-7.6 resp.! C. Am 20.02.2014 17:00, schrieb Simon Peyton Jones: | Can you not s

Re: haskell xml parsing for larger files?

2014-02-20 Thread Christian Maeder
I'm afraid our use case is not a lazy prefix traversal. I'm more shocked that about 100 MB xml content do not fit (as tree) into 3 GB memory. Christian Am 20.02.2014 16:49, schrieb malcolm.wallace: Is your usage pattern over the constructed tree likely to be a lazy prefix traversal? If so, t

RE: [Typeable2] Re: ANNOUNCE: GHC 7.8.1 Release Candidate 1

2014-02-20 Thread Simon Peyton Jones
| Can you not simply let ghc-7.8 interpret TypeableN as Typeable? Not really: ghc-7.8 does still support TypeableN I think (for backward compat reasons). So it can't take it to mean two different things. | -Original Message- | From: Glasgow-haskell-users [mailto:glasgow-haskell-users- |

Re: haskell xml parsing for larger files?

2014-02-20 Thread malcolm.wallace
Is your usage pattern over the constructed tree likely to be a lazy prefix traversal?  If so, then HaXml supports lazy construction of the parse tree.  Some plots appear at the end of this paper, showing how memory usage can be reduced to a constant, even for very large inputs (1 million tree nodes

Re: [Typeable2] Re: ANNOUNCE: GHC 7.8.1 Release Candidate 1

2014-02-20 Thread Christian Maeder
Yes, changing Typeable2 to Typeable in: {-# LANGUAGE StandaloneDeriving, DeriveDataTypeable #-} ... deriving instance Typeable Gr goes through with ghc-7.8-rc1. However, this change refuses to compile with ghc-7.6.3: Expecting two more arguments to `Gr' In the stand-alone deriving inst

Re: haskell xml parsing for larger files?

2014-02-20 Thread Mathieu Boespflug
Hi Christian, as regards your question about sharing strings, there are a number of libraries on Hackage to achieve this, e.g. in the context of compiler symbols. To cite only a few: intern, stringtable-atom, simple-atom. I'm sure there are others. Best, -- Mathieu Boespflug Founder at http://twe

Re: haskell xml parsing for larger files?

2014-02-20 Thread Chris Smith
Ah, I'd misunderstood your question, and thought you were looking for a sax-like alternative. On Feb 20, 2014 6:57 AM, "Christian Maeder" wrote: > I've just tried: > > import Text.HTML.TagSoup > import Text.HTML.TagSoup.Tree > > main :: IO () > main = getContents >>= putStr . renderTags .

Re: haskell xml parsing for larger files?

2014-02-20 Thread Christian Maeder
I've just tried: import Text.HTML.TagSoup import Text.HTML.TagSoup.Tree main :: IO () main = getContents >>= putStr . renderTags . flattenTree . tagTree . parseTags which also ends with the getMBlock error. Only "renderTags . parseTags" works fine (like the hexpat SAX parser). Why sh

Re: haskell xml parsing for larger files?

2014-02-20 Thread Chris Smith
Have you looked at tagsoup? On Feb 20, 2014 3:30 AM, "Christian Maeder" wrote: Hi, I've got some difficulties parsing "large" xml files (> 100MB). A plain SAX parser, as provided by hexpat, is fine. However, constructing a tree consumes too much memory on a 32bit machine. see http://trac.inform

haskell xml parsing for larger files?

2014-02-20 Thread Christian Maeder
Hi, I've got some difficulties parsing "large" xml files (> 100MB). A plain SAX parser, as provided by hexpat, is fine. However, constructing a tree consumes too much memory on a 32bit machine. see http://trac.informatik.uni-bremen.de:8080/hets/ticket/1248 I suspect that sharing strings when