You likely want a combination of both solutions. For example, say the data was this:

  John Doe/INVALID-LATLON/Sally Smith

With only Mikes solution, you would get

 <Origin_>INVALID-LATLON</Origin_>

So we will properly detect that this isn't a valid "Origin" element, but the nullable Origin_ element will accept invalid non-null data. If you want this (e.g. well-formed vs valid) you could maybe add a restriction that Origin_ must have zero length. Though, I'm not sure if there's a XML schema restriction that says it must be null, which is what you really want. Otherwise this data would parse and appear valid:

  John Doe//Sally Smith

On the otherhand, with only my solution, this would detect that INVALID-LATLON s not a null Origin_ element, but parsing the complex Origin element would still run into the issue Mike described where Daffodil happily consumes data and goes off the rails and fails somewhere else.

In your case, you want assertions on all of these elements. With that, the order of whether you check put Origin_ first or second in the choice doesn't really matter.

On 8/25/22 10:55 AM, Roger L Costello wrote:
Thanks Steve.

So now we have two solutions: the solution that Mike identified using 
dfdl:checkConstraints(.), and the solution that Steve identified switching the 
order of the choice branches and using fn:nilled(.).

I am writing this stuff up. Which solution should I recommend? I prefer Steve's 
solution since it is simpler and easier to describe. Thoughts?

/Roger

-----Original Message-----
From: Steve Lawrence <[email protected]>
Sent: Thursday, August 25, 2022 8:54 AM
To: [email protected]
Subject: [EXT] Re: Daffodil does not correctly parse variable length, nillable 
elements with complexType

Ah yeah, you're right. If Origin_ were first, you would also need an
assert that Origin_ must be nilled to cause it to backtrack and try the
other branch if it wasn't the nil value, e.g.:

    <xs:element name="Origin_" ... >
      <xs:annotation>
        <xs:appinfo source="http://www.ogf.org/dfdl/";>
          <dfdl:assert>{ fn:nilled(.) }</dfdl:assert>
        </xs:appinfo>
      </xs:annotation>
    </xs:element>

Note that this is probably a good idea to add regardless of order.
Otherwise, if there was a parse error in the Origin element (which would
occur with Mike's changes on invalid data) then Origin_ would accept any
string, but it is intended to only parse a nil element.



On 8/25/22 7:47 AM, Roger L Costello wrote:
Thanks Mike and Steve!

I need to study carefully what Mike said.

Steve, I think your suggestion is not correct. I did as you suggested and 
reversed the order of the branches in the choice. Now, with this input:

John Doe/2006N-05912E/Sally Smith

I get this XML:

<Test>
    <A>John Doe</A>
    <Origin_>2006N-05912E</Origin_>
    <B>Sally Smith</B>
</Test>

which is not correct.

I conclude that switching the order of the branches in the choice is not 
correct. Do you concur?

/Roger

-----Original Message-----
From: Steve Lawrence <[email protected]>
Sent: Thursday, August 25, 2022 7:38 AM
To: [email protected]
Subject: [EXT] Re: Daffodil does not correctly parse variable length, nillable 
elements with complexType

Another option, put your nillable type as the first branch in the
choice. This way Daffodil will attempt to parse the nillable type first,
and will only attempt to parse the complex Origin.

You'll still likely want the validation that Mike suggestions so that
when something fails it fails immediately instead of happily continuing
off the rails.


On 8/25/22 7:30 AM, Mike Beckerle wrote:
I think I know what is happening.

In the battle of delimiters vs. nested explicit length, explicit wins.

So if you have abc/-/cef

but after parsing abc then finding the separator /, the next field is
latitudeDegrees with explicit length 2, that "wins" and "-/" are the characters
of that string.

Validation will then issue a validation warning because Daffodil's "limited"
validation is done as the elements are parsed.

This does not cause backtracking, it's just a "warning" that the seemingly
well-formed data is invalid.

Then latitudeMinutes is parsed, and that uses the ever problematic lengthKind
pattern, which succeeds, with a zero-length string, which then also causes a
validation error.

     Again because this validation error because this, now zero-length string
doesn't look like the digits you expect.

Then it parses the hyphen element, which is just a string of length 1,

.... I'll stop here because things are clearly off the rails.

Here's my suggestion for how to fix this and get Daffodil to magically do what
you want, which is to pay attention to the facets.

<!-- vString = 'validated string'. Facets are checked while parsing. -->
<simpleType name="vString">
       <annotation><appinfo source="http://www.ogf.org/dfdl/
<http://www.ogf.org/dfdl/>">
           <dfdl:assert message="Invalid value">{ dfdl:checkConstraints(.)
}</dfdl:assert>
       </appinfo></annotation>
        <restriction base="xs:string"/>
</simpleType>

Define all your strings with vString as your type, and it should behave much
more like you expect.

Now normally I tell people not to call checkConstraints(.) on everything because
it fails to distinguish well-formed data from invalid data, and often one wants
the parse to succeed even if the data is invalid.

In your case things are different. You have not provided enough information in
the DFDL properties to parse this data. The facets are necessary information to
successfully parse it.

You will want to complement vString with use of discriminators. For example I
think your schema should have a discriminator after the latitudeDegrees element
because if you successfully parse that element, backtracking to the nilled case
no longer makes sense.




On Thu, Aug 25, 2022 at 7:01 AM Roger L Costello <[email protected]
<mailto:[email protected]>> wrote:

       Hi Folks,

       Here are two sample inputs:

       John Doe/2006N-05912E/Sally Smith
       John Doe/-/Sally Smith

       It is the field in the middle that is of interest.

       The field is a composite field, i.e., it consists of a series of parts: 
lat
       degrees, lat minutes, lat hemisphere, hyphen, long degrees, long minutes,
       long hemisphere. No separator between the parts.

       The field is nillable and the hyphen is the nil value.

       The first input shown above succeeds, the second fails to parse.

       What we have here is a variable length, nillable element with a 
complexType
       and the nil value is not %ES;. As we have determined in previous posts,
       Daffodil does not support this. So, the workaround is to place the 
element
       in a choice, where the first branch of the choice is the element minus 
the
       nillable stuff and the second branch is a plain string element that is
       nillable. Well, I implemented that and Daffodil complains:

       [error] Parse Error: Failed to parse infix separator. Cause: Parse Error:
       Separator '/' not found

       When I use the -V limited parse option I get a completely different set 
of
       error messages, e.g.:

       [error] Validation Error: LatitudeMinutes failed facet checks due to: 
facet
       pattern(s):
       
[0-9]{2}|[0-9]{2}\.[0-9]{1}|[0-9]{2}\.[0-9]{2}|[0-9]{2}\.[0-9]{3}|[0-9]{2}\.[0-9]{4}

       Am I doing something wrong in my DFDL schema (shown below) or is this a 
bug
       in Daffodil?  /Roger

       <?xml version="1.0" encoding="UTF-8"?>
       <xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/
       <http://www.ogf.org/dfdl/dfdl-1.0/>"
                             xmlns:xs="http://www.w3.org/2001/XMLSchema
       <http://www.w3.org/2001/XMLSchema>">
            <xs:annotation>
                <xs:appinfo source="http://www.ogf.org/dfdl/
       <http://www.ogf.org/dfdl/>">
                    <dfdl:format
                        alignment="1"
                        alignmentUnits="bytes"
                        emptyValueDelimiterPolicy="none"
                        encoding="ASCII"
                        encodingErrorPolicy="replace"
                        escapeSchemeRef=""
                        fillByte="%SP;"
                        floating="no"
                        ignoreCase="yes"
                        initiatedContent="no"
                        initiator=""
                        leadingSkip="0"
                        lengthKind="delimited"
                        lengthUnits="characters"
                        nilValueDelimiterPolicy="none"
                        occursCountKind="implicit"
                        outputNewLine="%CR;%LF;"
                        representation="text"
                        separator=""
                        separatorSuppressionPolicy="anyEmpty"
                        sequenceKind="ordered"
                        textBidi="no"
                        textPadKind="none"
                        textTrimKind="none"
                        trailingSkip="0"
                        truncateSpecifiedLengthString="no"
                        terminator=""
                        textNumberRep="standard"
                        textStandardBase="10"
                        textStandardZeroRep="0"
                        textNumberRounding="pattern"
                        textStandardExponentRep="E"
                        textNumberCheckPolicy="strict"/>
                </xs:appinfo>
            </xs:annotation>
            <xs:element name="Test">
                <xs:complexType>
                    <xs:sequence dfdl:separator="/" 
dfdl:separatorPosition="infix">
                        <xs:element name="A" type="xs:string"/>
                        <xs:choice dfdl:choiceLengthKind="implicit">
                            <xs:element name="Origin">
                                <xs:complexType>
                                    <xs:sequence dfdl:separator="">
                                        <xs:element name="LatitudeDegrees"
       dfdl:lengthKind="explicit" dfdl:length="2">
                                            <xs:simpleType>
                                                <xs:restriction 
base="xs:string">
                                                    <xs:pattern 
value="[0-9]{2}"/>
                                                </xs:restriction>
                                            </xs:simpleType>
                                        </xs:element>
                                        <xs:element name="LatitudeMinutes"
       dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(N|S))">
                                            <xs:simpleType>
                                                <xs:restriction 
base="xs:string">
                                                    <xs:pattern 
value="[0-9]{2}"/>
                                                    <xs:pattern
       value="[0-9]{2}\.[0-9]{1}"/>
                                                    <xs:pattern
       value="[0-9]{2}\.[0-9]{2}"/>
                                                    <xs:pattern
       value="[0-9]{2}\.[0-9]{3}"/>
                                                    <xs:pattern
       value="[0-9]{2}\.[0-9]{4}"/>
                                                </xs:restriction>
                                            </xs:simpleType>
                                        </xs:element>
                                        <xs:element name="LatitudeHemisphere"
       dfdl:lengthKind="explicit" dfdl:length="1">
                                            <xs:simpleType>
                                                <xs:restriction 
base="xs:string">
                                                    <xs:enumeration value="N"/>
                                                    <xs:enumeration value="S"/>
                                                </xs:restriction>
                                            </xs:simpleType>
                                        </xs:element>
                                        <xs:element name="Hyphen"
       dfdl:lengthKind="explicit" dfdl:length="1">
                                            <xs:simpleType>
                                                <xs:restriction 
base="xs:string">
                                                    <xs:enumeration value="-"/>
                                                </xs:restriction>
                                            </xs:simpleType>
                                        </xs:element>
                                        <xs:element name="LongitudeDegrees"
       dfdl:lengthKind="explicit" dfdl:length="3">
                                            <xs:simpleType>
                                                <xs:restriction 
base="xs:string">
                                                    <xs:pattern 
value="[0-9]{3}"/>
                                                </xs:restriction>
                                            </xs:simpleType>
                                        </xs:element>
                                        <xs:element name="LongitudeMinutes"
       dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(E|W))">
                                            <xs:simpleType>
                                                <xs:restriction 
base="xs:string">
                                                    <xs:pattern 
value="[0-9]{2}"/>
                                                    <xs:pattern
       value="[0-9]{2}\.[0-9]{1}"/>
                                                    <xs:pattern
       value="[0-9]{2}\.[0-9]{2}"/>
                                                    <xs:pattern
       value="[0-9]{2}\.[0-9]{3}"/>
                                                    <xs:pattern
       value="[0-9]{2}\.[0-9]{4}"/>
                                                </xs:restriction>
                                            </xs:simpleType>
                                        </xs:element>
                                        <xs:element name="LongitudeHemisphere">
                                            <xs:simpleType>
                                                <xs:restriction 
base="xs:string">
                                                    <xs:enumeration value="E"/>
                                                    <xs:enumeration value="W"/>
                                                </xs:restriction>
                                            </xs:simpleType>
                                        </xs:element>
                                    </xs:sequence>
                                </xs:complexType>
                            </xs:element>
                            <xs:element name="Origin_" type="xs:string"
       nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="-"/>
                        </xs:choice>
                        <xs:element name="B" type="xs:string"/>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
       </xs:schema>


Reply via email to