Re: [xml] xmlReader: Possibility for cloning an xmlTextReader (or multi-pass reading)

Eric S. Eberhard Sun, 07 Apr 2013 01:19:40 -0700

It is also relatively trivial to do it yourself ... the offset of thenodes in my experience are always the same for the same document. Soyou can keep in memory the offset from the base node (not the address ofthe nodes which is not constant). So after the first read you wouldhave direct access to any data you wanted in subsequent passes.

I would also question the need for this if (as it sounds) you would onlybe reading the document twice. In the "old days" I agonized to writesuper efficient code -- and in the more recent years have found that twoparses, one after the other, due to modern machines and cache ... arehardly noticeable even with extreme volume. We tend to do a lot ofsmaller documents with libxml2 but we process in and out about 50,000document per hour on a relatively modest computer (4 core IBM AIX).Many are read over and over because it is lazier and easier to code thatway and pressure to get things working is now exceeding pressure to makethe ultimate performance code.

What I am saying is you might cringe at -- and instinctively hate (as Ido) -- the idea of just reading it twice == but you might want to runsome benchmarks and see if you really care or not.


Eric

On 3/31/2013 12:00 AM, Liam R E Quin wrote:

On Sat, 2013-03-30 at 08:02 +0100, Martin B. wrote:
[...]

It turns out however, that the subtree where the large data resides has
to be read not in-order, but I have to collect some (small amount of)
data before the other.

Do you process the file only once, or many times, between times when it
changes?

If you process the large file multiple times, and you have to
re-engineer the code, you could consider something like dbxml, which
will parse the document once and can make an index; subsequence accesses
can be fairly fast because there's no need to read the file.

If you process the large file at most once every time it changes, using
an XQery engine like dbxml isn't so obvious a win: it can still help
with the out-of-order access in some cases, but the gains might not be
worth the engineering effort of reworking your code.

As Daniel has said, libxml is pretty fast at parsing, so another
strategy might be fetching only the parts you need into a new, smaller
XML document and then using XSLT.

Liam


--
Eric S. Eberhard
VICS
PO Box 3661
Camp Verde, AZ  86322

928-567-3727  work                      928-301-7537  cell

http://www.vicsmba.com/index.html             (our work)
http://www.vicsmba.com/ourpics/index.html     (fun pictures)

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
https://mail.gnome.org/mailman/listinfo/xml

Re: [xml] xmlReader: Possibility for cloning an xmlTextReader (or multi-pass reading)

Reply via email to