Stepping through the debugger, I'm seeing behavior that makes me wonder
if there's another part to this. The relevant data looks like this:

...
31...
32A...
32A...
32A...
32A...
79...
...

So there's a single F1_Record (31), four F2_Record's (32A), and a
C01_Record (79).

Daffodil successfully parses the F1 and F2 records, resulting in an
infoset like this:

  <F_Records>
    <F_Record>
      <F01_Record>...</F01Record>
      <F02_Records>
        <F02_Record>...</F02_Record>
        <F02_Record>...</F02_Record>
        <F02_Record>...</F02_Record>
        <F02_Record>...</F02_Record>
      </F02_Records>
    </F_Record>
  </F_Records>

It then tries to speculatively parse another F_Record since there can be
an unbounded number of these. Inside this F_Record it tries to parse an
F01_Record and an F02_Record, both of which fail since the next field is
79 (C01_Record), which is okay since they are optional.

Here's where I think things go wrong. Daffodil now needs to backtrack
and undo that second F_Record it incorrectly speculatively parsed. But
instead of undoing just the second F_Record it undoes the entire
F_Records element, including the fields that were successfully parsed,
and backtracks all the way back to the first 31 record. Instead of
undoing the entire F_Records, it should just undo the second F_Record.

Since it backtracked too far, it then continually tries to parse the
first 31 record, but that never works. And everything is optional so
Daffodil keeps chugging along, speculatively parsing and then undoing
the speculative parse.

Finally it gets to the Control_Record, which *is* required, and that
fails because 31 is not a Control_Record. And leads to the error about
Control_Record having an invalid valid.

So I think the underlying issue here is that when maxOccurs is unbounded
we get to a point where we parse 0 bytes, try to deal with no forward
progress back backtracking but end up backtrack too far.

Things work in the bounded case because it tries to speculatively parse
998 F_Records, those all fail (just like in the unbounded case), but
then it doesn't remove the entire F_Record.


On 12/4/19 7:59 PM, Sloane, Brandon wrote:
> The issue is that you have maxOccurs="unbounded" on elements which are 
> potentially 0 bits long.
> 
> In particular, F_Record and B_Record. Both of those elements have only 
> optional 
> children. This means that they will never fail to parse. Instead they will 
> succeed in parsing, but consume 0 bits. Because they can occur an unbounded 
> number of times, Daffodil considers this to be an error, and backtracks (and 
> subsequently throws an unrelated error down the line).
> 
> When maxOccurs is finite, then Daffodil will parse the 0 bits a finite number 
> of 
> times before resuming the parse normally.
> 
> The simplest solution to this, is to add an explicit assertion that F_Record 
> and 
> B_Record are non-empty:
> 
>     <xs:annotation>
>          <xs:appinfo source="http://www.ogf.org/dfdl/";>
>              <dfdl:assert>
>                { dfdl:contentLength(.,'bits') gt 0 }
>              </dfdl:assert>
>          </xs:appinfo>
>     </xs:annotation>
> 
> 
> Attached, you will find a version of pug_records.xsd that takes this approach.
> 
> While this is not technically a bug in Daffodil, it really should issue a 
> warning when this situation arises. I have opened a ticket to that effect: 
> https://issues.apache.org/jira/browse/DAFFODIL-2247
> 
> Given the above, you may be wondering why you do not see thousands of empty 
> instances of F_Record when maxOccurs="9999". I believe this is the correct 
> behavior in this case as defined by section 9.4.2.3, but I would need to read 
> the spec very closely to be sure that this is not a bug in Daffodil.
> 
> Regards,
> Brandon
> 
> --------------------------------------------------------------------------------
> *From:* Peter Kostouros <[email protected]>
> *Sent:* Wednesday, December 4, 2019 4:52 PM
> *To:* [email protected] <[email protected]>
> *Subject:* RE: Daffodil parsing fails on optional elements when maxOccurs set 
> to 
> "unbounded", passes when set to "999"
> 
> Hi
> 
> I have attached a dataset that shows the problem (PUG.IN) as well as its 
> corresponding parsed output when the schema has set maxOccurs limits on 
> selected 
> optional elements (PUG_999.IN.XML).
> 
> The F_LOOP records in file PUG.IN start with “31” and “32A”.
> 
> **
> 
> **
> 
> Peter
> 
> *From:*Sloane, Brandon [mailto:[email protected]]
> *Sent:* Thursday, 5 December 2019 2:44 AM
> *To:* [email protected]
> *Subject:* Re: Daffodil parsing fails on optional elements when maxOccurs set 
> to 
> "unbounded", passes when set to "999"
> 
> The only thing that stands out to me is that the error you are seeing should 
> be 
> coming from ControlRecord, which isn't part of the quoted schema. Other then 
> that, I am not sure what the issue could be (unless your data actually parses 
> more then 999 instances when unbounded is used).
> 
> Do you have example data that you can share which demonstrates the problem?
> 
> --------------------------------------------------------------------------------
> 
> *From:*Peter Kostouros <[email protected] 
> <mailto:[email protected]>>
> *Sent:* Wednesday, December 4, 2019 12:35 AM
> *To:* [email protected] <mailto:[email protected]> 
> <[email protected] <mailto:[email protected]>>
> *Subject:* Daffodil parsing fails on optional elements when maxOccurs set to 
> "unbounded", passes when set to "999"
> 
> Hi
> 
> I hope someone can point me in the right direction to help me understand 
> behaviour I have seen with a particular schema when parsing a file.
> 
> I have a schema modelled on the NACHA schema files found in the 
> DFDLSchemas/NACHA directory on github. In my case, with respect to optional 
> (embedded) looping elements:
> 
> 1.Parsing is unsuccessful when maxOccurs attribute is set to “unbounded” 
> (Parse 
> Error: Failed to populate ControlRecord[1]. Cause: Parse Error: Assertion 
> failed: Not Control Record);
> 
> 2.Parsing is successful when maxOccurs is limited to say “999”.
> 
> Below is a snippet from the schema referred to above that results in error:
> 
> <!-- F LOOP -->
> 
> <xs:element dfdl:lengthKind="implicit" name="F_Records" minOccurs="0">
> 
>    <xs:complexType>
> 
>      <xs:sequence>
> 
>        <xs:element dfdl:lengthKind="implicit" name="F_Record" minOccurs="0" 
> maxOccurs="unbounded">
> 
>          <xs:complexType>
> 
>            <xs:sequence>
> 
>              <xs:element ref="F01_Record" minOccurs="0" />
> 
>              <xs:element dfdl:lengthKind="implicit" name="F02_Records" 
> minOccurs="0">
> 
>                <xs:complexType>
> 
>                  <xs:sequence>
> 
>                    <xs:element ref="F02_Record" minOccurs="0" 
> maxOccurs="unbounded" />
> 
>                  </xs:sequence>
> 
>                </xs:complexType>
> 
>              </xs:element>
> 
>            </xs:sequence>
> 
>          </xs:complexType>
> 
>        </xs:element>
> 
>      </xs:sequence>
> 
>    </xs:complexType>
> 
> </xs:element>
> 
> I have attached schema files that demonstrate this issue, so I hope someone 
> can 
> advise me on what I should be correctly doing.
> 
> I am using the Daffodil 2.4.0 release as well as the 2.5 snapshot JAVA APIs, 
> and 
> both show similar behaviour; I have also seen this behaviour when running the 
> files though the daffodil command line, with the following tunables
> 
> "unqualifiedPathStepPolicy" = "defaultNamespace"
> 
> "suppressSchemaDefinitionWarnings" = "multipleChoiceBranches noEmptyDefault"
> 
> Peter
> 
> This e-mail and any attachment is intended for the party to which it is 
> addressed and may contain confidential information or be subject to 
> professional 
> privilege. Its transmission in not intended to place the contents into the 
> public domain. If you have received this e-mail in error, please notify us 
> immediately and delete the email and all copies. AWTA Ltd does not warrant 
> that 
> this e-mail is virus or error free. By opening this e-mail and any attachment 
> the user assumes all responsibility for any loss or damage resulting from 
> such 
> action, whether or not caused by the negligence of AWTA Ltd. The contents of 
> this e-mail and any attachments are subject to copyright and may not be 
> reproduced, adapted or transmitted without the prior written permission of 
> the 
> copyright owner.
> 
> This e-mail and any attachment is intended for the party to which it is 
> addressed and may contain confidential information or be subject to 
> professional 
> privilege. Its transmission in not intended to place the contents into the 
> public domain. If you have received this e-mail in error, please notify us 
> immediately and delete the email and all copies. AWTA Ltd does not warrant 
> that 
> this e-mail is virus or error free. By opening this e-mail and any attachment 
> the user assumes all responsibility for any loss or damage resulting from 
> such 
> action, whether or not caused by the negligence of AWTA Ltd. The contents of 
> this e-mail and any attachments are subject to copyright and may not be 
> reproduced, adapted or transmitted without the prior written permission of 
> the 
> copyright owner.
> 

Reply via email to