Hi Daniel, thanks for your reply!

Well, you are right with the buffer writing to memory and the author of 
the XMLSec library confirmed that he has to have the whole document there 
due to c14n. Also it seems that it is a fundamental part of the process, 
so there is no easy fix on his side.
http://www.aleksey.com/pipermail/xmlsec/2012/009411.html

As you say the data should be evacuated progressively and the buffer 
should not ever be that big, I must ask again if we understand each other 
here. I've been debugging the process whole yesterday and I can see the 
write callbacks are consistently writing chunks about 4KB, so no big 
buffering occurs per se. What seems odd is that all those small writes are 
tracked by the buffer struct I mentioned before and that has an 'int' 
counter that would IMHO overflow with 2GB+ large output no matter the 
destination is a file or memory.

Please, check here - the place of the overflow:
http://git.gnome.org/browse/libxml2/tree/xmlIO.c#n3445
Correct me if I'm wrong, but the value of out->written seems like a total 
of all the small writes related to the final output. Actually, nothing 
wrong seems to happen when this counter overflows to negative, until you 
later call xmlOutputBufferClose() which on success returns this counter 
value, that is now negative, but it's not an error code. This is 
definitely at least interesting, don't you think? ;)

Thanks!
Vit

PS: When I add a simple hack that guards the counter to never become 
negative the XML verification starts working, but the value of the counter 
becomes useless. Also this means other logic seems to be correct for large 
files, that is good news.
BTW, the error manifests here in the XMLSec code:
http://git.gnome.org/browse/xmlsec/tree/src/c14n.c#n277 

Daniel Veillard <veill...@redhat.com> wrote on 05/24/2012 05:36:47 AM:
>
> On Wed, May 23, 2012 at 05:55:01PM +0200, Vit Zikmund wrote:
> > Greetings libxml gurus!
> > We are using XMLSec library built on top of libxml2 to process some 
large 
> > XML files, however it doesn't seem to work for files >2GB, which is 
> > unfortunately what we need.
> > 
> > I'd like to ask if the library should support processing that large 
files 
> > (otherwise, this might be a bug).
> 
>   libxml2 certainly parses files larged than 2GB, I have tested with
> files larger than 4GB to make sure we had no 32 bits limitations on
> input.
>
> > It seems there's a limitation in the struct _xmlOutputBuffer, that 
stores 
> > written bytes in a signed int - therefore the max limit is 2GB.
> > Here it is: 
> > http://git.gnome.org/browse/libxml2/tree/include/libxml/xmlIO.h#n141
> 
>   Then I would guess the _xmlOutputBuffer was created to output in
>   memory which is the worse situation, because usuall xmlOutputBuffer
> have a set of I/O routines associated and those are called to evacuate
> progressively the output data, we should never accumulate 2G of output
> in memory !
>
> > We'd really like if the library could support 64 bit sizes and I see 
the 
> > struct _xmlParserInputBuffer, that's nearby, does. It uses unsigned 
long 
> > that's 64bit for x86_64 architecture, we are building for.
> > It might really help us if someone here could know what else will need 
to 
> > be fixed for the whole thing to work. If it's going to be a patch or a 

> > full scale project.
> 
>   Make sure first that you are not dumping to a memory buffer then
> if the problem persists we will try to fix things. So how was the
> xmlOutputBuffer allocated ?
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to