Re: Question regarding behaviour of optional elements with dfdl assertions.

Sloane, Brandon Fri, 21 Feb 2020 19:59:22 -0800

To clarify, the surprising behavior to you is that an out of order element does 
not trigger an error?


Error handling is one of the more subtle points of DFDL. In your example, what 
is happening is that Daffodil parses C, validates it, throws a processing 
error, then backtracks to before it parsed C. What it sounds like you want to 
have happen is for Daffodil to parse C, throw a processing error, then fail the 
entire parse.

Unfourtuantly, I am not aware of a direct way of writing such an assertion. 
Instead you need to implement it in 2 pieces: 1) something which blocks 
backtracking, and 2) something which raises a processing error. In the language 
of the DFDL spec, you need to through a processing error from a component that 
is known-to-exist.

There are 3 ways for a component to be known-to-exist (see section 9.3.1.1):


  1.  There is a 
dfdl:discriminator[8]<https://daffodil.apache.org/docs/dfdl/#_ftn8> applying to 
the component and its expression evaluates to true or regular expression 
pattern matches.
  2.  The component is a direct child of an xs:sequence or xs:choice with 
dfdl:initiatedContent 'yes' and an initiator defined for the component is found.
  3.  The component is a direct child of an xs:choice with 
dfdl:choiceDispatchKey and the result of the dfdl:choiceDispatchKey expression 
matches the dfdl:choiceChoiceBranchKey property of the child.

If it matches your data format, the simplest solution would be to (2), but this 
requires each of the fields to have an initiator.

Using 3 requires an xs:choice, which would not be consistent with using an 
unordered sequence.

The first option, a discriminator, is by far the most general. In my 
experience, it can almost always be made to work, but can get tricky to get 
exactly right. In you implementation of C_Record, at some point you would have 
determined that the data you are actually parsing is actually a C_Record. At 
that point, you would put in a discriminator of fn:true(). Any subsequent 
processing errors will no longer be able to backtrack to before that point, so 
your existing assertion on C_Record will become a fatal error.
________________________________
From: Peter Kostouros <[email protected]>
Sent: Friday, February 21, 2020 5:28 PM
To: [email protected] <[email protected]>
Subject: RE: Question regarding behaviour of optional elements with dfdl 
assertions.

Hi

Thanks for your reply.

Basically what I would like to do is have a C_Record be preceded by X, Y, Z, L 
_Record(s), and S_Record(s) to be preceded by a C_Record. I was expecting the 
assertion in the C_Record element to have triggered an error as in the case of 
a, for example, "LC" input: here is the output with the updated schema

(debug) c
<?xml version="1.0" encoding="UTF-8" ?>
<T_Records>
  <L_Record></L_Record>
</T_Records>
[warning] Left over data. Consumed 8 bit(s) with at least 24 bit(s) remaining.
Left over data (Hex) starting at byte 2 is: (0x43200a...)
Left over data (UTF-8) starting at byte 2 is: (C␣␊...)

Thanks for your suggestion.



Peter


________________________________
From: Sloane, Brandon [[email protected]]
Sent: Saturday, February 22, 2020 2:33 AM
To: [email protected]
Subject: Re: Question regarding behaviour of optional elements with dfdl 
assertions.

Attached is the version of your schema I am using. I modified it slightly to 
work as a stand alone schema.

> In some tests I am finding that if a C_Record is preceded by an L_Record (a 
> cause for an error), the system outputs the L_Record to the info set and does 
> not raise a fault.

This is expected. There is nothing in your schema that says that an L_Record is 
not permitted to occur before a C_Record. What is prohibited is an S_Record 
occuring before a C record. E.G the string "XLC" gives:

<T_Records>
  <X_Record></X_Record>
  <C_Record></C_Record>
  <L_Record></L_Record>
</T_Records>

While XLSC Gives:

<T_Records>
  <X_Record></X_Record>
  <L_Record></L_Record>
</T_Records>
[warning] Left over data. Consumed 16 bit(s) with at least 16 bit(s) remaining.
Left over data (Hex) starting at byte 3 is: (0x5343...)
Left over data (UTF-8) starting at byte 3 is: (SC...)

In contrast, XCLS gives:

<T_Records>
  <X_Record></X_Record>
  <C_Record></C_Record>
  <L_Record></L_Record>
  <S_Records>
    <S_Record></S_Record>
  </S_Records>
</T_Records>

Regarding setting minOccurs=1 on C, I cannot actually reproduce this causing 
any error (which I suspect is a bug, as I would expect it to trigger an error 
if C is not succesfully parsed).

I would also point out that the infoset would get produced and unparsed in 
schema order, so it would appear that C occurs before Y or Z regardless of what 
the input was.

For what you are doing, it might make more sense to instead have a sequence of 
sequences, where XYZ can occurs as an unordered sequence, then C, then L and S 
as an unordered sequence. The only issue here is if you really want L to be 
able to occur in any position.


________________________________
From: Peter Kostouros <[email protected]>
Sent: Thursday, February 20, 2020 9:34 PM
To: [email protected] <[email protected]>
Subject: Question regarding behaviour of optional elements with dfdl assertions.


Hi



I am using dfdl assertions to implement business rules for a particular model. 
This model consists of an unordered sequence of optional records, however,



1.       At least one of those optional records must exist to have a valid 
model;

2.       A particular record type must be preceded by another specific record 
type for (the former record) to be accepted.



In the latter case, I am using the exists function to determine whether 
expected records are in place and the error function to raise a fault should 
the conditions not be met, as shown in the DFDL schema snippet below. The 
intention of this schema is that should a C_Record be found, it must be 
preceded by either an X, Y or Z _Record.



In some tests I am finding that if a C_Record is preceded by an L_Record (a 
cause for an error), the system outputs the L_Record to the info set and does 
not raise a fault: if element C_Record’s minOccurs is set to 1, the parsing 
raises an error or the form



Validation Error: cvc-complex-type.2.4.a: Invalid content was found starting 
with element '{"tc":L_Record}'. One of '{"tc":X_Record, "tc":C_Record}' is 
expected.



What is the expected behaviour in such situations? or perhaps someone can point 
me in the right direction to help me achieve a schema that describes this type 
of model?



I am using Daffodil 2.5.0 public release.



<xs:element dfdl:lengthKind="implicit" name="T_Records" minOccurs="1">

    <xs:complexType>

        <xs:sequence dfdl:sequenceKind="unordered">

            <xs:element ref="X_Record" minOccurs="0" 
dfdl:occursCountKind="parsed" />

            <xs:element ref="C_Record" minOccurs="0" 
dfdl:occursCountKind="parsed">

                <xs:annotation>

                    <xs:appinfo source="http://www.ogf.org/dfdl/";>

                        <dfdl:assert message="Unexpected record found: must be 
preceded by X_Record, Y_Record or Z_Record." testKind='expression'>

                            {

                                if (fn:exists(../X_Record)   or

                                    fn:exists(../Y_Record)   or

                                    fn:exists(../Z_Record))  then

                                        fn:true()

                                else

                                        fn:error()

                            }

                        </dfdl:assert>

                    </xs:appinfo>

                </xs:annotation>

            </xs:element>



            <xs:element ref="Y_Record" minOccurs="0" 
dfdl:occursCountKind="parsed" />

            <xs:element ref="Z_Record" minOccurs="0" 
dfdl:occursCountKind="parsed" />

            <xs:element ref="L_Record" minOccurs="0" 
dfdl:occursCountKind="parsed" />



            <xs:element dfdl:lengthKind="implicit" name="S_Records" 
minOccurs="0" dfdl:occursCountKind="parsed">

                <xs:complexType>

                    <xs:sequence>

                        <xs:element ref="S_Record" minOccurs="1" 
maxOccurs="unbounded">

                            <xs:annotation>

                                <xs:appinfo source="http://www.ogf.org/dfdl/";>

                                    <dfdl:assert message='Unexpected record 
found: must be preceded by C_Record.' testKind='expression'>

                                        {

                                            if (fn:exists(../../C_Record)) then

                                                fn:true()

                                            else

                                                fn:error()

                                        }

                                    </dfdl:assert>

                                </xs:appinfo>

                            </xs:annotation>

                        </xs:element>

                    </xs:sequence>

                </xs:complexType>

            </xs:element>

        </xs:sequence>

    </xs:complexType>

</xs:element>





Peter



This e-mail and any attachment is intended for the party to which it is 
addressed and may contain confidential information or be subject to 
professional privilege. Its transmission in not intended to place the contents 
into the public domain. If you have received this e-mail in error, please 
notify us immediately and delete the email and all copies. AWTA Ltd does not 
warrant that this e-mail is virus or error free. By opening this e-mail and any 
attachment the user assumes all responsibility for any loss or damage resulting 
from such action, whether or not caused by the negligence of AWTA Ltd. The 
contents of this e-mail and any attachments are subject to copyright and may 
not be reproduced, adapted or transmitted without the prior written permission 
of the copyright owner.
This e-mail and any attachment is intended for the party to which it is 
addressed and may contain confidential information or be subject to 
professional privilege. Its transmission in not intended to place the contents 
into the public domain. If you have received this e-mail in error, please 
notify us immediately and delete the email and all copies. AWTA Ltd does not 
warrant that this e-mail is virus or error free. By opening this e-mail and any 
attachment the user assumes all responsibility for any loss or damage resulting 
from such action, whether or not caused by the negligence of AWTA Ltd. The 
contents of this e-mail and any attachments are subject to copyright and may 
not be reproduced, adapted or transmitted without the prior written permission 
of the copyright owner.

Re: Question regarding behaviour of optional elements with dfdl assertions.

Reply via email to