Hi Folks,
I am jumping around in my writeups.
As always, please let me know of anything that is unclear. /Roger
--------------------------------------------------------------------------------------
11. Variable length, nillable, composite, no choice
A composite field is one that is composed of parts. There is no separator
between the parts. The parts may be fixed length or variable length. The parts
are non-nillable, although the composite field itself may be nillable.
This section deals with composite fields containing parts that are variable
length and the field is nillable.
We will create a DFDL schema for a "Location" field that has a latitude and
longitude, separated by a dash. Here is a sample value:
2006N-05912E
That is one value with 7 parts:
The first two digits (20) represents a latitude in degrees.
The next two digits (06) represents the latitude in minutes.
The N indicates the latitude's hemisphere.
The dash ( - ) separates the latitude values from the following longitude
values.
The 059 represents the longitude in degrees.
The 12 represents the longitude in minutes.
The E represents the longitude hemisphere.
In other words, the location is latitude 20 degrees, 6 minutes North, longitude
59 degrees, 12 minutes East.
Both the latitude minute and longitude minute are variable length are expressed
as a two-digit integer or as a decimal value. If a decimal, there may be 1-4
digits to the right of the decimal point. Here are Location values with minute
parts (highlighted in yellow) that have decimal values:
4221.6N-71003.5W
4221.63N-71003.57W
4221.630N-71003.576W
4221.6300N-71003.5760W
Here is one more example of a valid Location value:
-
That value means: no data was available to populate the field.
To re-emphasize, Location is a variable length, nillable, composite field.
Here is an XML Schema declaration of Location, sans any DFDL properties (I
highlighted in yellow the field name and part names):
<xs:element name="Location" nillable="true">
<xs:complexType>
<xs:sequence>
<xs:element name="LatitudeDegrees">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeMinutes">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{1}" />
<xs:pattern value="[0-9]{2}\.[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{3}" />
<xs:pattern value="[0-9]{2}\.[0-9]{4}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeHemisphere">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="N" />
<xs:enumeration value="S" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Hyphen">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="-" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeDegrees">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeMinutes">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{1}" />
<xs:pattern value="[0-9]{2}\.[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{3}" />
<xs:pattern value="[0-9]{2}\.[0-9]{4}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeHemisphere">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="E" />
<xs:enumeration value="W" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
These parts have fixed length: LatitudeDegrees, LatitudeHemisphere, Hyphen,
LongitudeDegrees, and LongitudeHemisphere.
These parts have variable length: LatitudeMinutes and LongitudeMinutes.
For the fixed length parts, add these two DFDL properties:
dfdl:lengthKind="explicit"
dfdl:length="__"
For example, LatitudeDegrees has a fixed length of 2. Here is its declaration,
with the DFDL properties (in yellow) added:
<xs:element name="LatitudeDegrees"
dfdl:lengthKind="explicit"
dfdl:length="2">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
Use the same strategy for the other fixed fields.
LatitudeMinutes is variable length. The part that follows it
(LatitudeHemisphere) has a fixed length (its value is either N or S). To
declare LatitudeMinutes, add these two DFDL properties:
dfdl:lengthKind="pattern"
dfdl:lengthPattern="regex"
In the regex use a lookahead pattern. Here is LatitudeMinutes, extended with
the DFDL properties (in yellow):
<xs:element name="LatitudeMinutes"
dfdl:lengthKind="pattern"
dfdl:lengthPattern=".*?(?=(N|S))">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}"/>
<xs:pattern value="[0-9]{2}\.[0-9]{1}"/>
<xs:pattern value="[0-9]{2}\.[0-9]{2}"/>
<xs:pattern value="[0-9]{2}\.[0-9]{3}"/>
<xs:pattern value="[0-9]{2}\.[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Read that as: the content of LatitudeMinutes is the text up to, but not
including N or S.
Use the same regex lookahead strategy for LongitudeMinutes.
As I stated earlier, Location is nillable with hyphen as the nil value.
Further, Location has a complexType. That is a problem. See section 2 for a
complete discussion of the problem with nillable complexTypes and how to deal
with it.
Here's the DFDL schema for the Location field (DFDL is shown in yellow):
<xs:element name="Location">
<xs:complexType>
<xs:sequence>
<xs:element name="LatitudeDegrees"
dfdl:lengthKind="explicit"
dfdl:length="2">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeMinutes"
dfdl:lengthKind="pattern"
dfdl:lengthPattern=".*?(?=(N|S))">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{1}" />
<xs:pattern value="[0-9]{2}\.[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{3}" />
<xs:pattern value="[0-9]{2}\.[0-9]{4}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeHemisphere"
dfdl:lengthKind="explicit"
dfdl:length="1">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="N" />
<xs:enumeration value="S" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Hyphen"
dfdl:lengthKind="explicit"
dfdl:length="1">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="-" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeDegrees"
dfdl:lengthKind="explicit"
dfdl:length="3">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeMinutes"
dfdl:lengthKind="pattern"
dfdl:lengthPattern=".*?(?=(E|W))">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{1}" />
<xs:pattern value="[0-9]{2}\.[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{3}" />
<xs:pattern value="[0-9]{2}\.[0-9]{4}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeHemisphere">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="E" />
<xs:enumeration value="W" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Notice that the last part (LongitudeHemisphere) has no DFDL added. This is
because I am assuming that it is followed by the delimiter for the Location
field.