You likely want a combination of both solutions. For example, say the
data was this:
John Doe/INVALID-LATLON/Sally Smith
With only Mikes solution, you would get
<Origin_>INVALID-LATLON</Origin_>
So we will properly detect that this isn't a valid "Origin" element, but
the nullable Origin_ element will accept invalid non-null data. If you
want this (e.g. well-formed vs valid) you could maybe add a restriction
that Origin_ must have zero length. Though, I'm not sure if there's a
XML schema restriction that says it must be null, which is what you
really want. Otherwise this data would parse and appear valid:
John Doe//Sally Smith
On the otherhand, with only my solution, this would detect that
INVALID-LATLON s not a null Origin_ element, but parsing the complex
Origin element would still run into the issue Mike described where
Daffodil happily consumes data and goes off the rails and fails
somewhere else.
In your case, you want assertions on all of these elements. With that,
the order of whether you check put Origin_ first or second in the choice
doesn't really matter.
On 8/25/22 10:55 AM, Roger L Costello wrote:
Thanks Steve.
So now we have two solutions: the solution that Mike identified using
dfdl:checkConstraints(.), and the solution that Steve identified switching the
order of the choice branches and using fn:nilled(.).
I am writing this stuff up. Which solution should I recommend? I prefer Steve's
solution since it is simpler and easier to describe. Thoughts?
/Roger
-----Original Message-----
From: Steve Lawrence <[email protected]>
Sent: Thursday, August 25, 2022 8:54 AM
To: [email protected]
Subject: [EXT] Re: Daffodil does not correctly parse variable length, nillable
elements with complexType
Ah yeah, you're right. If Origin_ were first, you would also need an
assert that Origin_ must be nilled to cause it to backtrack and try the
other branch if it wasn't the nil value, e.g.:
<xs:element name="Origin_" ... >
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{ fn:nilled(.) }</dfdl:assert>
</xs:appinfo>
</xs:annotation>
</xs:element>
Note that this is probably a good idea to add regardless of order.
Otherwise, if there was a parse error in the Origin element (which would
occur with Mike's changes on invalid data) then Origin_ would accept any
string, but it is intended to only parse a nil element.
On 8/25/22 7:47 AM, Roger L Costello wrote:
Thanks Mike and Steve!
I need to study carefully what Mike said.
Steve, I think your suggestion is not correct. I did as you suggested and
reversed the order of the branches in the choice. Now, with this input:
John Doe/2006N-05912E/Sally Smith
I get this XML:
<Test>
<A>John Doe</A>
<Origin_>2006N-05912E</Origin_>
<B>Sally Smith</B>
</Test>
which is not correct.
I conclude that switching the order of the branches in the choice is not
correct. Do you concur?
/Roger
-----Original Message-----
From: Steve Lawrence <[email protected]>
Sent: Thursday, August 25, 2022 7:38 AM
To: [email protected]
Subject: [EXT] Re: Daffodil does not correctly parse variable length, nillable
elements with complexType
Another option, put your nillable type as the first branch in the
choice. This way Daffodil will attempt to parse the nillable type first,
and will only attempt to parse the complex Origin.
You'll still likely want the validation that Mike suggestions so that
when something fails it fails immediately instead of happily continuing
off the rails.
On 8/25/22 7:30 AM, Mike Beckerle wrote:
I think I know what is happening.
In the battle of delimiters vs. nested explicit length, explicit wins.
So if you have abc/-/cef
but after parsing abc then finding the separator /, the next field is
latitudeDegrees with explicit length 2, that "wins" and "-/" are the characters
of that string.
Validation will then issue a validation warning because Daffodil's "limited"
validation is done as the elements are parsed.
This does not cause backtracking, it's just a "warning" that the seemingly
well-formed data is invalid.
Then latitudeMinutes is parsed, and that uses the ever problematic lengthKind
pattern, which succeeds, with a zero-length string, which then also causes a
validation error.
Again because this validation error because this, now zero-length string
doesn't look like the digits you expect.
Then it parses the hyphen element, which is just a string of length 1,
.... I'll stop here because things are clearly off the rails.
Here's my suggestion for how to fix this and get Daffodil to magically do what
you want, which is to pay attention to the facets.
<!-- vString = 'validated string'. Facets are checked while parsing. -->
<simpleType name="vString">
<annotation><appinfo source="http://www.ogf.org/dfdl/
<http://www.ogf.org/dfdl/>">
<dfdl:assert message="Invalid value">{ dfdl:checkConstraints(.)
}</dfdl:assert>
</appinfo></annotation>
<restriction base="xs:string"/>
</simpleType>
Define all your strings with vString as your type, and it should behave much
more like you expect.
Now normally I tell people not to call checkConstraints(.) on everything because
it fails to distinguish well-formed data from invalid data, and often one wants
the parse to succeed even if the data is invalid.
In your case things are different. You have not provided enough information in
the DFDL properties to parse this data. The facets are necessary information to
successfully parse it.
You will want to complement vString with use of discriminators. For example I
think your schema should have a discriminator after the latitudeDegrees element
because if you successfully parse that element, backtracking to the nilled case
no longer makes sense.
On Thu, Aug 25, 2022 at 7:01 AM Roger L Costello <[email protected]
<mailto:[email protected]>> wrote:
Hi Folks,
Here are two sample inputs:
John Doe/2006N-05912E/Sally Smith
John Doe/-/Sally Smith
It is the field in the middle that is of interest.
The field is a composite field, i.e., it consists of a series of parts:
lat
degrees, lat minutes, lat hemisphere, hyphen, long degrees, long minutes,
long hemisphere. No separator between the parts.
The field is nillable and the hyphen is the nil value.
The first input shown above succeeds, the second fails to parse.
What we have here is a variable length, nillable element with a
complexType
and the nil value is not %ES;. As we have determined in previous posts,
Daffodil does not support this. So, the workaround is to place the
element
in a choice, where the first branch of the choice is the element minus
the
nillable stuff and the second branch is a plain string element that is
nillable. Well, I implemented that and Daffodil complains:
[error] Parse Error: Failed to parse infix separator. Cause: Parse Error:
Separator '/' not found
When I use the -V limited parse option I get a completely different set
of
error messages, e.g.:
[error] Validation Error: LatitudeMinutes failed facet checks due to:
facet
pattern(s):
[0-9]{2}|[0-9]{2}\.[0-9]{1}|[0-9]{2}\.[0-9]{2}|[0-9]{2}\.[0-9]{3}|[0-9]{2}\.[0-9]{4}
Am I doing something wrong in my DFDL schema (shown below) or is this a
bug
in Daffodil? /Roger
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/
<http://www.ogf.org/dfdl/dfdl-1.0/>"
xmlns:xs="http://www.w3.org/2001/XMLSchema
<http://www.w3.org/2001/XMLSchema>">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/
<http://www.ogf.org/dfdl/>">
<dfdl:format
alignment="1"
alignmentUnits="bytes"
emptyValueDelimiterPolicy="none"
encoding="ASCII"
encodingErrorPolicy="replace"
escapeSchemeRef=""
fillByte="%SP;"
floating="no"
ignoreCase="yes"
initiatedContent="no"
initiator=""
leadingSkip="0"
lengthKind="delimited"
lengthUnits="characters"
nilValueDelimiterPolicy="none"
occursCountKind="implicit"
outputNewLine="%CR;%LF;"
representation="text"
separator=""
separatorSuppressionPolicy="anyEmpty"
sequenceKind="ordered"
textBidi="no"
textPadKind="none"
textTrimKind="none"
trailingSkip="0"
truncateSpecifiedLengthString="no"
terminator=""
textNumberRep="standard"
textStandardBase="10"
textStandardZeroRep="0"
textNumberRounding="pattern"
textStandardExponentRep="E"
textNumberCheckPolicy="strict"/>
</xs:appinfo>
</xs:annotation>
<xs:element name="Test">
<xs:complexType>
<xs:sequence dfdl:separator="/"
dfdl:separatorPosition="infix">
<xs:element name="A" type="xs:string"/>
<xs:choice dfdl:choiceLengthKind="implicit">
<xs:element name="Origin">
<xs:complexType>
<xs:sequence dfdl:separator="">
<xs:element name="LatitudeDegrees"
dfdl:lengthKind="explicit" dfdl:length="2">
<xs:simpleType>
<xs:restriction
base="xs:string">
<xs:pattern
value="[0-9]{2}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeMinutes"
dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(N|S))">
<xs:simpleType>
<xs:restriction
base="xs:string">
<xs:pattern
value="[0-9]{2}"/>
<xs:pattern
value="[0-9]{2}\.[0-9]{1}"/>
<xs:pattern
value="[0-9]{2}\.[0-9]{2}"/>
<xs:pattern
value="[0-9]{2}\.[0-9]{3}"/>
<xs:pattern
value="[0-9]{2}\.[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeHemisphere"
dfdl:lengthKind="explicit" dfdl:length="1">
<xs:simpleType>
<xs:restriction
base="xs:string">
<xs:enumeration value="N"/>
<xs:enumeration value="S"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Hyphen"
dfdl:lengthKind="explicit" dfdl:length="1">
<xs:simpleType>
<xs:restriction
base="xs:string">
<xs:enumeration value="-"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeDegrees"
dfdl:lengthKind="explicit" dfdl:length="3">
<xs:simpleType>
<xs:restriction
base="xs:string">
<xs:pattern
value="[0-9]{3}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeMinutes"
dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(E|W))">
<xs:simpleType>
<xs:restriction
base="xs:string">
<xs:pattern
value="[0-9]{2}"/>
<xs:pattern
value="[0-9]{2}\.[0-9]{1}"/>
<xs:pattern
value="[0-9]{2}\.[0-9]{2}"/>
<xs:pattern
value="[0-9]{2}\.[0-9]{3}"/>
<xs:pattern
value="[0-9]{2}\.[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeHemisphere">
<xs:simpleType>
<xs:restriction
base="xs:string">
<xs:enumeration value="E"/>
<xs:enumeration value="W"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Origin_" type="xs:string"
nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="-"/>
</xs:choice>
<xs:element name="B" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>