Hi Christopher,

It's not a bug. <element name="a" maxOccurs="1000"/> is handled as a regex
"a,a?,a?,...,a?" where there are 999 "a?". Then we construct a state
machine from the regex, which would have 1001 states. This is the reason
why the parsing time grows.

You might think that why not we use one state and put a range on that state
(1~1000). Well, it's easy to "think", and maybe easy to implement on leaf
nodes (<element>, <any>), but consider a more complicated case:

<sequence maxOccurs="1000">
   <element name="a"/>
   <element name="b" maxOccurs="1000"/>
</sequence>

How do we process the maxOccurs="1000" on the <sequence>?

Of course, I believe there are solutions in the state machine theory for
such kind of problems, but currently I don't think it's necessary to
(dramatically) complicate the algorithm. My suggestion is that if you don't
mean it (min/maxOccurs="1000" or "10000"), then don't say it. And maxOccurs
="unbounded" might be a better choice in this case.

Cheers,
Sandy Gao
Software Developer, IBM Canada
(1-416) 448-3255
[EMAIL PROTECTED]



                                                                                       
                            
                    Christopher                                                        
                            
                    Knorr                To:     [EMAIL PROTECTED]           
                            
                    <foxxlf23@bung       cc:                                           
                            
                    o.com>               Subject:     Bug                              
                            
                                                                                       
                            
                    06/07/2001                                                         
                            
                    03:46 PM                                                           
                            
                    Please respond                                                     
                            
                    to                                                                 
                            
                    xerces-j-dev                                                       
                            
                                                                                       
                            
                                                                                       
                            



When using maxOccured with values over 1000 the time to validate grows
enormously, despite the same data set size. Values of 10,000 simply cause a
stack overflow.

Should I submit this as a bug and/or can anyone explain this?

Christopher Knorr
Software Engineer
NOMOS Corporation





------------------------------------------------------------------------
Bungo's New Connected Workspace. The new way to exchange information and
manage workgroups across systems, networks and locations.  Learn more at
http://www.bungo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  • Bug Christopher Knorr

Reply via email to