Re: Transforming huge XML-files - 3-4GB

Tatu Saloranta Sat, 22 Nov 2008 13:25:24 -0800

--- On Sat, 11/22/08, Mikael Jansson <[EMAIL PROTECTED]> wrote:

...
> it's an app for converting XML-files from the Swedish tax aurthur with
> tax-data (and som other stuff, like EU custom-data).
> I'm doing this for a company witch business  is to hold a databas with this 
> data
> and distribute it to there customers directly from databas to database so
> they wount have
> to create this app for there self. My app, downloads, pgp-validates, unzips
> and convert the XML to SQL and then loads them in to a database.
> 
> They distribute a number difference-xml-files every night and a number of
> total-xml-files every month or so. It's these Totals that is so big.


This is actually quite a common use case for xml, although not necessarily for 
xslt, or other approaches that require full in-memory tree representation.

Your best best really is to process data one sub-tree at a time (build 
in-memory sub-trees, process separately, re-combine output if necessary), as 
suggested by others, or use a fully streaming approach (such as StaxMate 
library).

Also, given that this sounds more like a data-oriented task (as opposed to 
document oriented), perhaps data mapping (xml-to-object) tools such as JAXB 
might be better fit for processing the data. JAXB for example can also bind 
just sub-trees (when using Stax parser as source), to avoid size problems.

-+ Tatu +-

Re: Transforming huge XML-files - 3-4GB

Reply via email to