Hi Folks,
Please let me know of anything that is unclear. /Roger
--------------------------------------------------------------------------------------
4. Fixed length, nillable, composite, choice
A composite field is one that is composed of parts. There is no separator
between the parts. The parts may be fixed length or variable length. The parts
are non-nillable, although the composite field itself may be nillable.
This section deals with a nillable field whose value is a choice between two
composite fields and the composite fields contain parts that are fixed length.
We will create a DFDL schema for a field containing the date that a book was
published. I named the field "PublicationDate." There are two ways to express
the publication date:
1. 4-digit year followed by a 3-letter month
2. 3-letter month followed by a 4-digit year
Here is a sample value for the first way:
2022SEP
Here is a sample value for the second way:
SEP2022
In both cases, the field is composite with two parts. The field has a length of
7.
If no data is available, then the field will contain a hyphen.
Field Requirements:
>> Fixed length (7)
>> Nillable, hyphen is the nil value, the hyphen may be positioned anywhere
>> within the 7-character field
>> Choice of values
>> Each choice is composite, each choice has 2 parts
PublicationDate has a complexType and its value may be nil. Recall from section
2 that a complexType element with a nillable value is a problem. The workaround
is to put a wrapper element around PublicationDate. The wrapper element
(PublicationDateWrapper) has a choice of values: a simpleType element
(PublicationDate_) that is used for the case where the input contains a nil
value, and the other branch of the choice is the PublicationDate element:
<xs:element name="PublicationDateWrapper">
<xs:complexType>
<xs:choice>
<xs:element name="PublicationDate_" type="xs:string"
nillable="true" />
<xs:element name="PublicationDate">
<!-- choice of Year, Month or Month, Year -->
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
Here is an XML Schema declaration of PublicationDate, sans any DFDL properties
(I highlighted in yellow the field name - PublicationDate - and its two choices
and for each choice its part names):
<xs:element name="PublicationDate">
<xs:complexType>
<xs:choice>
<xs:element name="YearMonth"> <!-- branch #1 -->
<xs:complexType>
<xs:sequence>
<xs:element name="Year">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Month">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="JAN"/>
<xs:enumeration value="FEB"/>
<xs:enumeration value="MAR"/>
<xs:enumeration value="APR"/>
<xs:enumeration value="MAY"/>
<xs:enumeration value="JUN"/>
<xs:enumeration value="JUL"/>
<xs:enumeration value="AUG"/>
<xs:enumeration value="SEP"/>
<xs:enumeration value="OCT"/>
<xs:enumeration value="NOV"/>
<xs:enumeration value="DEC"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="MonthYear"> <!-- branch #2 -->
<xs:complexType>
<xs:sequence>
<xs:element name="Month">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="JAN"/>
<xs:enumeration value="FEB"/>
<xs:enumeration value="MAR"/>
<xs:enumeration value="APR"/>
<xs:enumeration value="MAY"/>
<xs:enumeration value="JUN"/>
<xs:enumeration value="JUL"/>
<xs:enumeration value="AUG"/>
<xs:enumeration value="SEP"/>
<xs:enumeration value="OCT"/>
<xs:enumeration value="NOV"/>
<xs:enumeration value="DEC"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Year">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
Each branch has two parts, and they are fixed length. Add to them these two
DFDL properties:
dfdl:lengthKind="explicit"
dfdl:length="__"
Consider how this input:
SEP2022
is parsed. The first part (SEP) is the month. The parse starts down first
branch and immediately fails since SEP does not satisfy the facets of the first
element (Year). An error is thrown and parsing halts. The parser does not
backup and try the other branch.
The solution is to add checkConstraints() in the declaration of Year:
<xs:element name="Year">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
</xs:appinfo>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
The checkConstraints() tells the parser to validate the data against the facets
and if validation fails then backup and try the other choice branch.
Here's the DFDL schema with the DFDL properties added (shown in yellow):
<xs:element name="PublicationDate">
<xs:complexType>
<xs:choice>
<xs:element name="YearMonth"> <!-- branch #1 -->
<xs:complexType>
<xs:sequence>
<xs:element name="Year"
dfdl:lengthKind="explicit"
dfdl:length="4">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{ dfdl:checkConstraints(.)
}</dfdl:assert>
</xs:appinfo>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Month"
dfdl:lengthKind="explicit"
dfdl:length="3">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="JAN"/>
<xs:enumeration value="FEB"/>
...
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="MonthYear"> <!-- branch #2 -->
<xs:complexType>
<xs:sequence>
<xs:element name="Month"
dfdl:lengthKind="explicit"
dfdl:length="3">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="JAN"/>
<xs:enumeration value="FEB"/>
...
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Year"
dfdl:lengthKind="explicit"
dfdl:length="4">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
Notice that the last part of the second branch (Year) has no DFDL added. This
is because I am assuming that it is followed by the delimiter for the
PublicationDate field.
The wrapper element and its child nil element are exactly analogous to that
shown in section 2.
<xs:element name="PublicationDateWrapper">
<xs:complexType>
<xs:choice dfdl:choiceLengthKind="implicit">
<xs:element name="PublicationDate_" type="xs:string" nillable="true"
dfdl:nilKind="literalValue"
dfdl:nilValue="%WSP*;-%WSP*;">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{ fn:nilled(.) }</dfdl:assert>
</xs:appinfo>
</xs:annotation>
</xs:element>
<!-- see PublicationDate above -->
</xs:choice>
</xs:complexType>
</xs:element>
One last (important) point: When parsing input with Daffodil use the -V limited
option. The option instructs Daffodil to validate each part of the composite
fields against the XSD facets. With this erroneous input value:
2022xxx
Daffodil gives this very helpful error message on parsing:
[error] Validation Error: Month failed facet checks due to: facet
enumeration(s): JAN|FEB|...
If you don't use the -V limited option, then Daffodil won't validate the parts
against the XSD facets. Consequently, Daffodil will not report any errors with
the above erroneous input. Why? Because if we ignore the facets in this element
declaration:
<xs:element name="Month"
dfdl:lengthKind="explicit"
dfdl:length="3">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="JAN"/>
<xs:enumeration value="FEB"/>
...
</xs:restriction>
</xs:simpleType>
</xs:element>
then it is simply saying that the input is any text of length 3, and "xxx"
certainly fits that specification.