To clarify, the surprising behavior to you is that an out of order element does not trigger an error?
Error handling is one of the more subtle points of DFDL. In your example, what is happening is that Daffodil parses C, validates it, throws a processing error, then backtracks to before it parsed C. What it sounds like you want to have happen is for Daffodil to parse C, throw a processing error, then fail the entire parse. Unfourtuantly, I am not aware of a direct way of writing such an assertion. Instead you need to implement it in 2 pieces: 1) something which blocks backtracking, and 2) something which raises a processing error. In the language of the DFDL spec, you need to through a processing error from a component that is known-to-exist. There are 3 ways for a component to be known-to-exist (see section 9.3.1.1): 1. There is a dfdl:discriminator[8]<https://daffodil.apache.org/docs/dfdl/#_ftn8> applying to the component and its expression evaluates to true or regular expression pattern matches. 2. The component is a direct child of an xs:sequence or xs:choice with dfdl:initiatedContent 'yes' and an initiator defined for the component is found. 3. The component is a direct child of an xs:choice with dfdl:choiceDispatchKey and the result of the dfdl:choiceDispatchKey expression matches the dfdl:choiceChoiceBranchKey property of the child. If it matches your data format, the simplest solution would be to (2), but this requires each of the fields to have an initiator. Using 3 requires an xs:choice, which would not be consistent with using an unordered sequence. The first option, a discriminator, is by far the most general. In my experience, it can almost always be made to work, but can get tricky to get exactly right. In you implementation of C_Record, at some point you would have determined that the data you are actually parsing is actually a C_Record. At that point, you would put in a discriminator of fn:true(). Any subsequent processing errors will no longer be able to backtrack to before that point, so your existing assertion on C_Record will become a fatal error. ________________________________ From: Peter Kostouros <[email protected]> Sent: Friday, February 21, 2020 5:28 PM To: [email protected] <[email protected]> Subject: RE: Question regarding behaviour of optional elements with dfdl assertions. Hi Thanks for your reply. Basically what I would like to do is have a C_Record be preceded by X, Y, Z, L _Record(s), and S_Record(s) to be preceded by a C_Record. I was expecting the assertion in the C_Record element to have triggered an error as in the case of a, for example, "LC" input: here is the output with the updated schema (debug) c <?xml version="1.0" encoding="UTF-8" ?> <T_Records> <L_Record></L_Record> </T_Records> [warning] Left over data. Consumed 8 bit(s) with at least 24 bit(s) remaining. Left over data (Hex) starting at byte 2 is: (0x43200a...) Left over data (UTF-8) starting at byte 2 is: (C␣␊...) Thanks for your suggestion. Peter ________________________________ From: Sloane, Brandon [[email protected]] Sent: Saturday, February 22, 2020 2:33 AM To: [email protected] Subject: Re: Question regarding behaviour of optional elements with dfdl assertions. Attached is the version of your schema I am using. I modified it slightly to work as a stand alone schema. > In some tests I am finding that if a C_Record is preceded by an L_Record (a > cause for an error), the system outputs the L_Record to the info set and does > not raise a fault. This is expected. There is nothing in your schema that says that an L_Record is not permitted to occur before a C_Record. What is prohibited is an S_Record occuring before a C record. E.G the string "XLC" gives: <T_Records> <X_Record></X_Record> <C_Record></C_Record> <L_Record></L_Record> </T_Records> While XLSC Gives: <T_Records> <X_Record></X_Record> <L_Record></L_Record> </T_Records> [warning] Left over data. Consumed 16 bit(s) with at least 16 bit(s) remaining. Left over data (Hex) starting at byte 3 is: (0x5343...) Left over data (UTF-8) starting at byte 3 is: (SC...) In contrast, XCLS gives: <T_Records> <X_Record></X_Record> <C_Record></C_Record> <L_Record></L_Record> <S_Records> <S_Record></S_Record> </S_Records> </T_Records> Regarding setting minOccurs=1 on C, I cannot actually reproduce this causing any error (which I suspect is a bug, as I would expect it to trigger an error if C is not succesfully parsed). I would also point out that the infoset would get produced and unparsed in schema order, so it would appear that C occurs before Y or Z regardless of what the input was. For what you are doing, it might make more sense to instead have a sequence of sequences, where XYZ can occurs as an unordered sequence, then C, then L and S as an unordered sequence. The only issue here is if you really want L to be able to occur in any position. ________________________________ From: Peter Kostouros <[email protected]> Sent: Thursday, February 20, 2020 9:34 PM To: [email protected] <[email protected]> Subject: Question regarding behaviour of optional elements with dfdl assertions. Hi I am using dfdl assertions to implement business rules for a particular model. This model consists of an unordered sequence of optional records, however, 1. At least one of those optional records must exist to have a valid model; 2. A particular record type must be preceded by another specific record type for (the former record) to be accepted. In the latter case, I am using the exists function to determine whether expected records are in place and the error function to raise a fault should the conditions not be met, as shown in the DFDL schema snippet below. The intention of this schema is that should a C_Record be found, it must be preceded by either an X, Y or Z _Record. In some tests I am finding that if a C_Record is preceded by an L_Record (a cause for an error), the system outputs the L_Record to the info set and does not raise a fault: if element C_Record’s minOccurs is set to 1, the parsing raises an error or the form Validation Error: cvc-complex-type.2.4.a: Invalid content was found starting with element '{"tc":L_Record}'. One of '{"tc":X_Record, "tc":C_Record}' is expected. What is the expected behaviour in such situations? or perhaps someone can point me in the right direction to help me achieve a schema that describes this type of model? I am using Daffodil 2.5.0 public release. <xs:element dfdl:lengthKind="implicit" name="T_Records" minOccurs="1"> <xs:complexType> <xs:sequence dfdl:sequenceKind="unordered"> <xs:element ref="X_Record" minOccurs="0" dfdl:occursCountKind="parsed" /> <xs:element ref="C_Record" minOccurs="0" dfdl:occursCountKind="parsed"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert message="Unexpected record found: must be preceded by X_Record, Y_Record or Z_Record." testKind='expression'> { if (fn:exists(../X_Record) or fn:exists(../Y_Record) or fn:exists(../Z_Record)) then fn:true() else fn:error() } </dfdl:assert> </xs:appinfo> </xs:annotation> </xs:element> <xs:element ref="Y_Record" minOccurs="0" dfdl:occursCountKind="parsed" /> <xs:element ref="Z_Record" minOccurs="0" dfdl:occursCountKind="parsed" /> <xs:element ref="L_Record" minOccurs="0" dfdl:occursCountKind="parsed" /> <xs:element dfdl:lengthKind="implicit" name="S_Records" minOccurs="0" dfdl:occursCountKind="parsed"> <xs:complexType> <xs:sequence> <xs:element ref="S_Record" minOccurs="1" maxOccurs="unbounded"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert message='Unexpected record found: must be preceded by C_Record.' testKind='expression'> { if (fn:exists(../../C_Record)) then fn:true() else fn:error() } </dfdl:assert> </xs:appinfo> </xs:annotation> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> Peter This e-mail and any attachment is intended for the party to which it is addressed and may contain confidential information or be subject to professional privilege. Its transmission in not intended to place the contents into the public domain. If you have received this e-mail in error, please notify us immediately and delete the email and all copies. AWTA Ltd does not warrant that this e-mail is virus or error free. By opening this e-mail and any attachment the user assumes all responsibility for any loss or damage resulting from such action, whether or not caused by the negligence of AWTA Ltd. The contents of this e-mail and any attachments are subject to copyright and may not be reproduced, adapted or transmitted without the prior written permission of the copyright owner. This e-mail and any attachment is intended for the party to which it is addressed and may contain confidential information or be subject to professional privilege. Its transmission in not intended to place the contents into the public domain. If you have received this e-mail in error, please notify us immediately and delete the email and all copies. AWTA Ltd does not warrant that this e-mail is virus or error free. By opening this e-mail and any attachment the user assumes all responsibility for any loss or damage resulting from such action, whether or not caused by the negligence of AWTA Ltd. The contents of this e-mail and any attachments are subject to copyright and may not be reproduced, adapted or transmitted without the prior written permission of the copyright owner.
