Hello Robert,

There's no particular reason why the vector is only being resized by 8 
each time. I've modified the code in CVS so that is scales better as you 
suggested and also modified reportError() so that it avoids adding items 
to the vector if the 
http://apache.org/xml/features/validation/schema/augment-psvi feature has 
been disabled.

Thanks.

"Devidi, Robert" <[EMAIL PROTECTED]> wrote on 02/10/2005 
10:24:43 AM:

> When using schema-based validation while sax parsing large files 
containing
> large numbers of minor validation errors, performance degrades rapidly 
as
> the file size increases.  In my testing, a 250M file is processed in 
about
> 15 minutes, whereas a 500M file containing proportiately the same number 
of
> errors takes several hours to run through.
> 
> Running under OptimizeIt shows that, when working through the largest 
files,
> the application is spending the majority of its time inside the
> XMLSchemaValidator$XSIErrorReporter.reportError() method, specifically 
on
> the line
> 
>     fErrors.addElement(key); 
> 
> I believe that because the declaration for the XSIErrorReporter's 
fErrors
> attribute includes a capacityIncrement value of 8, the vector ends up 
having
> to resize its underlying buffer every 8th error.  As the vector grows 
(with
> my test data, into the hundreds of thousands or even millions of 
entries),
> this ends up consuming a great deal of CPU, as well as keeping the 
garbage
> collector quite busy.
> 
> I recompiled with a very minor code change, altering the declaration for
> fErrors from the current
> 
>     Vector fErrors = new Vector(INITIAL_STACK_SIZE, INC_STACK_SIZE);
> 
> to use the default constructor
> 
>     Vector fErrors = new Vector(); 
> 
> hence allowing the vector to use the default capacityIncrement value of
> 0--i.e. the capacity doubles each time the vector exceeds its maximum 
size.
> With this change, the time required to process my largest test file is
> reduced from six or seven hours to a bit over 30 minutes.
> 
> I don't pretend to fully understand everything that's going on in this
> class.  Is there a compelling reason for specifying a capacityIncrement 
for
> fErrors?
> 
> In case it's useful, I'm running version 2.6.2 on Solaris.  I'm guessing 
it
> wouldn't really be appropriate to attach my test files....
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to