Hi Folks,
A lot of complexity got replaced with simplicity, thanks to Mike and Steve.
Here's the updated information. Lots of changes. If you find any errors, let me
know. /Roger
------------------------------------------
Daffodil now supports the -V limited option. The -V limited option is a game
changer. It totally changes the strategy for creating DFDL schemas. You use
less DFDL properties and more XSD facets. This is huge!
That said, what I am about to describe may or may not fit your DFDL work.
For my work, there already exists an XML Schema (XSD). [If your work doesn't
already have an XSD, then create one!] The XSD is scaffolding and all I must do
is add the appropriate DFDL properties to the scaffolding. All the leaf
elements in the XSD are of type xs:string and are constrained using pattern or
enumeration facets. Some data fields are nillable and so their corresponding
XSD element declarations have nillable="true". Others are non-nillable. Some
data fields have fixed length. Others have variable length. This message shows
how to add appropriate DFDL properties to each type of leaf element.
Before doing so, however, let's see how the -V limited option changes DFDL
schema development. Prior to the availability of the -V limited option I was
using dfdl:lengthPattern="regex" to specify leaf elements. As a result, I had
to:
* Convert each enumeration list in the XSD to a regex, where the
enumeration values became regex alternatives. Then I would sort the
alternatives longest-to-shortest. For fixed fields I would pad the alternatives
that weren't of the required length. And then I would set the sorted, padded
regex as the value of dfdl:lengthPattern. Now, with the -V limited option I
leave the enumeration list as it is. I ditched dfdl:lengthPattern. It's not
needed anymore.
* Convert pattern facets in the XSD to a single regex containing
alternatives. Then sort the alternatives longest-to-shortest. For fixed fields
I would pad the alternatives that weren't of the required length. And then I
would set the sorted, padded regex as the value of dfdl:lengthPattern. With the
-V limited option, I no longer process the pattern facet, I use it as is.
The -V limited option means greater use of XSD facets and less need for DFDL
properties. It means less processing: no more converting enumeration values
into regex alternatives, no more converting pattern facets into regex
alternatives, no more sorting regex alternatives in longest-to-shortest order,
and for fixed fields no more padding alternatives.
Here is the Desired Parsing Behavior: If data is well-formed and valid, I want
parsing to produce XML and display no errors. If data is well-formed but not
valid, I want parsing to produce XML and display errors. If data is not
well-formed, I want parsing to not produce XML and display errors.
I use the Daffodil -V limited option, as it results in the desired parsing
behavior.
As I said above, in my XSD the leaf elements are nillable or not, fixed length
or not. In other words, there four types of data fields:
1. Data field is fixed length, nillable
The following element declaration shows how to specify fixed length, nillable
fields.
Field specification:
>> Fixed length (3)
>> Nillable, hyphen is the nil value, the hyphen may be positioned anywhere
>> within the 3-character field
>> Values must be left-justified
>> Values shorter than 3 characters must be padded with spaces
<xs:element name="RunwayStatus"
nillable="true"
dfdl:nilKind="literalValue"
dfdl:nilValue="%WSP*;-"
dfdl:lengthKind="explicit"
dfdl:length="3"
dfdl:textTrimKind="padChar"
dfdl:textPadKind="padChar"
dfdl:textStringPadCharacter="%SP;"
dfdl:textStringJustification="left">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="FLT"/>
<xs:enumeration value="GVL"/>
<xs:enumeration value="BRK"/>
<xs:enumeration value="GDD"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
In this case all the enumeration values are of the required length (3). Suppose
some were shorter, would you need to pad them with spaces? No, there is no need
to pad enumeration values. The combination of dfdl:length="3" and
dfdl:textStringPadCharacter="%SP;" means that parsing will check that the input
field has length 3 and if it contains a value that is shorter than 3 it is
padded on the right with spaces. The dfdl:textStringJustification="left"
property specifies that values must be left-justified. Which means, this input
is okay:
.../AB /...
but this is not:
.../ AB/...
If there is no input data available to populate the field, a hyphen is to be
inserted. In other words, hyphen is the nil value. Of course, even with a nil
value the field is still required to have length 3, so the hyphen must be
padded with spaces. dfdl:nilValue="%WSP*;-" specifies that the hyphen may be
positioned anywhere within the 3-character field.
Let's see how a DFDL processor parses the element. With the following input
(note the spaces around the hyphen):
.../ - /...
parsing produces this output:
<RunwayStatus xsi:nil="true"></RunwayStatus>
and unparsing produces this output:
.../- /...
Notice that unparsing results in moving the hyphen to the left side of the
field.
With this input:
.../FLT/...
parsing produces this output:
<RunwayStatus>FLT</RunwayStatus>
and unparsing produces this output:
.../FLT/...
If a pattern facet had been used instead of the enumeration facet:
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="FLT|GVL|BRK|GDD" />
</xs:restriction>
</xs:simpleType>
everything works the same. That is, the same set of DFDL properties are used.
2. Data field is fixed length, non-nillable
The following element declaration shows how to specify fixed length,
non-nillable fields.
Field specification:
>> Fixed length (6)
>> Values must be left-justified
>> Values shorter than 6 characters must be padded with spaces
<xs:element name="TimeLabel"
dfdl:lengthKind="explicit"
dfdl:length="6"
dfdl:textTrimKind="padChar"
dfdl:textPadKind="padChar"
dfdl:textStringPadCharacter="%SP;"
dfdl:textStringJustification="left">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="JUPT"/>
<xs:enumeration value="VENUSS"/>
<xs:enumeration value="MARSSS"/>
<xs:enumeration value="SUNNYY"/>
<xs:enumeration value="EAR"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Notice that some of the enumeration values have a length less than the required
length (6). For example, EAR has a length of only 3. Does that mean we need to
pad those values with length less than 6? No, there is no need to pad any
enumeration value. The combination of dfdl:length="6" and
dfdl:textStringPadCharacter="%SP;" means that parsing will check the input
field to see that it has length 6 and if it contains a value that is shorter
than 6, check that it is padded on the right with spaces. The
dfdl:textStringJustification="left" property specifies that values must be
left-justified. In other words, this input is okay:
.../EAR /...
but this is not:
.../ EAR/...
Let's see how a DFDL processor parses the element. With the following input
(notice the value is less 4 characters, so it is padded with 2 spaces):
.../JUPT /...
parsing produces this output:
<TimeLabel>JUPT</TimeLabel>
and unparsing produces this output:
.../JUPT /...
In our example, the enumeration facet is used. If a pattern facet had been used
instead of the enumeration facet:
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="JUPT|VENUSS|MARSSS|SUNNYY|EAR" />
</xs:restriction>
</xs:simpleType>
everything works the same. That is, the same set of DFDL properties are used.
3. Data field is variable length, nillable
The following element declaration shows how to specify variable length,
nillable fields.
Field specification:
>> Variable length (2-20 characters)
>> Nillable, hyphen is the nil value, if a hyphen is present, it is the only
>> character in the field
<xs:element name="MessageID"
nillable="true"
dfdl:nilKind="literalValue"
dfdl:nilValue="-">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[A-Z0-9 ]{2,20}"></xs:pattern>
</xs:restriction>
</xs:simpleType>
</xs:element>
Let's see how a DFDL processor parses the element. With this input:
.../-/...
parsing produces this output:
<MessageID xsi:nil="true"></MessageID>
and unparsing produces this output:
.../-/...
With this input:
.../XRAY/...
parsing produces this output:
<MessageID>XRAY</MessageID>
and unparsing produces this output:
.../XRAY/...
4. Data field is variable length, non-nillable
The following element declaration shows how to specify variable length,
non-nillable fields.
Field specification:
>> Variable length (1-7 characters)
<xs:element name="MessageNumber">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[A-Z0-9 ]{1,7}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
Let's see how a DFDL processor parses the element. With this input:
.../BRAVO/...
parsing produces this output:
<MessageNumber>BRAVO</MessageNumber>
and unparsing produces this output:
.../BRAVO/...
The following table shows how to assign XSD and DFDL properties. The nil values
and length values shown in the table are from the above examples. Obviously for
your data you need to replace them with your values.
Properties to add onto the XSD element declaration
Data Field:
fixed length,
nillable
Data Field:
fixed length,
non-nillable
Data Field:
variable length,
nillable
Data Field:
variable length,
non-nillable
nillable
true
n/a
true
n/a
dfdl:nilKind
literalValue
n/a
literalValue
n/a
dfdl:nilValue
%WSP*;-
n/a
-
n/a
dfdl:lengthKind
explicit
explicit
delimited
delimited
dfdl:length
3
6
n/a
n/a
dfdl:textTrimKind
padChar
padChar
n/a
n/a
dfdl:textPadKind
padChar
padChar
n/a
n/a
dfdl:textStringPadCharacter
%SP;
%SP;
n/a
n/a
dfdl:textStringJustification
left
left
n/a
n/a
It should be possible to convert this table into a form that can be used to
automate the adding of DFDL properties onto element declarations.