Hi Christopher,
It's not a bug. <element name="a" maxOccurs="1000"/> is handled as a regex
"a,a?,a?,...,a?" where there are 999 "a?". Then we construct a state
machine from the regex, which would have 1001 states. This is the reason
why the parsing time grows.
You might think that why not we use one state and put a range on that state
(1~1000). Well, it's easy to "think", and maybe easy to implement on leaf
nodes (<element>, <any>), but consider a more complicated case:
<sequence maxOccurs="1000">
<element name="a"/>
<element name="b" maxOccurs="1000"/>
</sequence>
How do we process the maxOccurs="1000" on the <sequence>?
Of course, I believe there are solutions in the state machine theory for
such kind of problems, but currently I don't think it's necessary to
(dramatically) complicate the algorithm. My suggestion is that if you don't
mean it (min/maxOccurs="1000" or "10000"), then don't say it. And maxOccurs
="unbounded" might be a better choice in this case.
Cheers,
Sandy Gao
Software Developer, IBM Canada
(1-416) 448-3255
[EMAIL PROTECTED]
Christopher
Knorr To: [EMAIL PROTECTED]
<foxxlf23@bung cc:
o.com> Subject: Bug
06/07/2001
03:46 PM
Please respond
to
xerces-j-dev
When using maxOccured with values over 1000 the time to validate grows
enormously, despite the same data set size. Values of 10,000 simply cause a
stack overflow.
Should I submit this as a bug and/or can anyone explain this?
Christopher Knorr
Software Engineer
NOMOS Corporation
------------------------------------------------------------------------
Bungo's New Connected Workspace. The new way to exchange information and
manage workgroups across systems, networks and locations. Learn more at
http://www.bungo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]