Hi Conrad,

 thanks for that revised version :-)

On Sun, May 13, 2012 at 01:46:08AM -0700, Conrad Irwin wrote:
> On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veill...@redhat.com> wrote:
> > that's interesting ! I was initially afraid of a sudden explosion of
> > memory allocations for building a tree since by default buffers tend to
> > "waste" memory by using doubling allocations, but that's not the case.
> >  xmllint --noout doc/libxml2-api.xml
> > when compiled with memory debug produce
> >
> > paphio:~/XML -> cat .memdump
> >      MEMORY ALLOCATED : 0, MAX was 12756699
> >
> > and without your patch 12755657, i.e. the increase is minimal.
> 
> Heh, I thought that too. Actually you're looking at the result with 
> XML_ALLOC_EXACT! This
> is because EXACT adds 10bytes "spare" on each alloc, and that interestingly 
> wastes about the
> same amount of space as XML_ALLOC_DOUBLEIT on this example (see below).
> 
> So it turns out that the default realloc() on my system actually handles this 
> case really
> well — and I guess that all the time in xmlRealloc() was actually in 
> xmlStrlen, not the
> underlying realloc() after all (sorry for misleading you). If you replace the 
> realloc()

  yeah, if you're using Linux, realloc() performances are really good,
but clearly xmlStrlen() will lead to a O(n^2) performance !
Windows realloc() used to be slow, so at some point in the past we
removed some critical realocs and switched to XML_ALLOC_DOUBLEIT,
but that was some time ago, and while it was about very large
text nodes those were not affected by entity expansion.

> with a bad one (like valgrind's), then the performance degrades severely.
> 
> This patch implements a HYBRID allocator which has the behaviour you describe 
> (it's
> like EXACT to start with, though without the spare 10 bytes; and switches to 
> DOUBLEIT
> after 4kb) — that gets the memory back down to 12755657, with no noticeable 
> impact on the
> performance of the synthetic pathological example under valgrind.
> 
> In summary:
> 
>      max_memory on ./xmllint --noout doc/libxml2-api.xml,
>      valgrind time on https://gist.github.com/2656940
> 
>             max_memory    valgrind time
> before   |  12755657    | 29:18.2
> EXACT    |  12756699    |  2:58.6 <-- this is the state after the first patch.
> DOUBLEIT |  12756727    |  0:02.7
> HYBRID   |  12755754    |  0:02.7 <-- this is the state with both patches.

  Okay, great :-)

> 
> >
> > There is also the cost of creating the buffers all the time.
> > I need to read the code and check but I may be interested in an hybrid
> > approach where we switch to buffer only when the text node starts to
> > become too big (4k would remove nearly all usuall types of "document"
> > usage, i.e. not blocks of data)
> 
> I tried to avoid too much buffer creation by introducing the xmlBufferDetach 
> function,
> which allows re-using one buffer to construct many strings. It's maybe a bit 
> of a "hack"
> in API terms though I thought the gains would be worth it.

  I have read that now, okay this looks sane, I need to fix one or two
things but I believe I will commit those two soon :-)

 thanks a lot !

Daniel

P.S.: if you have too much time and want to help me, there is a leak in
  the new Lzma/xz support code which looks nasty and need to be fixed
  before the next release:
paphio:~/XML -> xmllint --version
xmllint: using libxml version 20708-GITv2.7.8-108-g7d0d2a5
 compiled with: Threads Tree Output Push Reader Patterns Writer
SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer
XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas
Schematron Modules Debug Zlib Lzma 
paphio:~/XML -> valgrind --leak-check=full xmllint --noout dba100000.xml 
==13484== 7,152 bytes in 1 blocks are definitely lost in loss record 1
of 1
==13484==    at 0x4A074CD: malloc (vg_replace_malloc.c:236)
==13484==    by 0x3473409926: inflateInit2_ (in /lib64/libz.so.1.2.5)
==13484==    by 0x541EFE: xz_head (xzlib.c:377)
==13484==    by 0x542C21: xz_make (xzlib.c:584)
==13484==    by 0x542F2C: __libxml2_xzread (xzlib.c:688)
==13484==    by 0x452A1C: xmlXzfileRead (xmlIO.c:1410)
==13484==    by 0x45500D: xmlParserInputBufferGrow (xmlIO.c:3282)
==13484==    by 0x414BE6: xmlParserInputGrow (parserInternals.c:353)
==13484==    by 0x41B738: xmlGROW (parser.c:2003)
==13484==    by 0x433DE6: xmlParseDocument (parser.c:10245)
==13484==    by 0x43D895: xmlDoRead (parser.c:14656)
==13484==    by 0x43D9C0: xmlReadFile (parser.c:14716)
==13484==    by 0x406A43: parseAndPrintFile (xmllint.c:2381)
==13484==    by 0x40BB4D: main (xmllint.c:3726)
==13484== 
paphio:~/XML -> 
   Somehow it seems xmlXzfileRead doesn't free some memory allocated
   internally in the xz library :-\

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to