Hi Roger,
Another great explainer!
Here’s my feedback:
This sentence is awkward: This section deals with composite fields containing
parts that are variable length and the field is nillable.
You are building up a single composite field which is nillable.
Maybe something like this would be less awkward: This section deals with a
nillable composite field containing parts that are variable length.
Can the regex for the decimal parts of the composite field be shortened to this?
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{1,4}" />
</xs:restriction>
</xs:simpleType>
Hope it helps,
Davin
From: Roger L Costello <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Monday, September 19, 2022 at 2:39 PM
To: "[email protected]" <[email protected]>
Subject: Here is my writeup of category #11: Field with variable length,
nillable, composite, no choice
Hi Folks,
I am jumping around in my writeups.
As always, please let me know of anything that is unclear. /Roger
--------------------------------------------------------------------------------------
11. Variable length, nillable, composite, no choice
A composite field is one that is composed of parts. There is no separator
between the parts. The parts may be fixed length or variable length. The parts
are non-nillable, although the composite field itself may be nillable.
This section deals with composite fields containing parts that are variable
length and the field is nillable.
We will create a DFDL schema for a “Location” field that has a latitude and
longitude, separated by a dash. Here is a sample value:
2006N-05912E
That is one value with 7 parts:
The first two digits (20) represents a latitude in degrees.
The next two digits (06) represents the latitude in minutes.
The N indicates the latitude’s hemisphere.
The dash ( - ) separates the latitude values from the following longitude
values.
The 059 represents the longitude in degrees.
The 12 represents the longitude in minutes.
The E represents the longitude hemisphere.
In other words, the location is latitude 20 degrees, 6 minutes North, longitude
59 degrees, 12 minutes East.
Both the latitude minute and longitude minute are variable length are expressed
as a two-digit integer or as a decimal value. If a decimal, there may be 1-4
digits to the right of the decimal point. Here are Location values with minute
parts (highlighted in yellow) that have decimal values:
4221.6N-71003.5W
4221.63N-71003.57W
4221.630N-71003.576W
4221.6300N-71003.5760W
Here is one more example of a valid Location value:
-
That value means: no data was available to populate the field.
To re-emphasize, Location is a variable length, nillable, composite field.
Here is an XML Schema declaration of Location, sans any DFDL properties (I
highlighted in yellow the field name and part names):
<xs:element name="Location" nillable="true">
<xs:complexType>
<xs:sequence>
<xs:element name="LatitudeDegrees">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeMinutes">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{1}" />
<xs:pattern value="[0-9]{2}\.[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{3}" />
<xs:pattern value="[0-9]{2}\.[0-9]{4}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeHemisphere">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="N" />
<xs:enumeration value="S" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Hyphen">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="-" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeDegrees">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeMinutes">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{1}" />
<xs:pattern value="[0-9]{2}\.[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{3}" />
<xs:pattern value="[0-9]{2}\.[0-9]{4}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeHemisphere">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="E" />
<xs:enumeration value="W" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
These parts have fixed length: LatitudeDegrees, LatitudeHemisphere, Hyphen,
LongitudeDegrees, and LongitudeHemisphere.
These parts have variable length: LatitudeMinutes and LongitudeMinutes.
For the fixed length parts, add these two DFDL properties:
dfdl:lengthKind="explicit"
dfdl:length="__"
For example, LatitudeDegrees has a fixed length of 2. Here is its declaration,
with the DFDL properties (in yellow) added:
<xs:element name="LatitudeDegrees"
dfdl:lengthKind="explicit"
dfdl:length="2">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
Use the same strategy for the other fixed fields.
LatitudeMinutes is variable length. The part that follows it
(LatitudeHemisphere) has a fixed length (its value is either N or S). To
declare LatitudeMinutes, add these two DFDL properties:
dfdl:lengthKind="pattern"
dfdl:lengthPattern="regex"
In the regex use a lookahead pattern. Here is LatitudeMinutes, extended with
the DFDL properties (in yellow):
<xs:element name="LatitudeMinutes"
dfdl:lengthKind="pattern"
dfdl:lengthPattern=".*?(?=(N|S))">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}"/>
<xs:pattern value="[0-9]{2}\.[0-9]{1}"/>
<xs:pattern value="[0-9]{2}\.[0-9]{2}"/>
<xs:pattern value="[0-9]{2}\.[0-9]{3}"/>
<xs:pattern value="[0-9]{2}\.[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Read that as: the content of LatitudeMinutes is the text up to, but not
including N or S.
Use the same regex lookahead strategy for LongitudeMinutes.
As I stated earlier, Location is nillable with hyphen as the nil value.
Further, Location has a complexType. That is a problem. See section 2 for a
complete discussion of the problem with nillable complexTypes and how to deal
with it.
Here’s the DFDL schema for the Location field (DFDL is shown in yellow):
<xs:element name="Location">
<xs:complexType>
<xs:sequence>
<xs:element name="LatitudeDegrees"
dfdl:lengthKind="explicit"
dfdl:length="2">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeMinutes"
dfdl:lengthKind="pattern"
dfdl:lengthPattern=".*?(?=(N|S))">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{1}" />
<xs:pattern value="[0-9]{2}\.[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{3}" />
<xs:pattern value="[0-9]{2}\.[0-9]{4}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LatitudeHemisphere"
dfdl:lengthKind="explicit"
dfdl:length="1">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="N" />
<xs:enumeration value="S" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="Hyphen"
dfdl:lengthKind="explicit"
dfdl:length="1">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="-" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeDegrees"
dfdl:lengthKind="explicit"
dfdl:length="3">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeMinutes"
dfdl:lengthKind="pattern"
dfdl:lengthPattern=".*?(?=(E|W))">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{1}" />
<xs:pattern value="[0-9]{2}\.[0-9]{2}" />
<xs:pattern value="[0-9]{2}\.[0-9]{3}" />
<xs:pattern value="[0-9]{2}\.[0-9]{4}" />
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="LongitudeHemisphere">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="E" />
<xs:enumeration value="W" />
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Notice that the last part (LongitudeHemisphere) has no DFDL added. This is
because I am assuming that it is followed by the delimiter for the Location
field.
-----------------------------------------------------------------
This message and any files transmitted within are intended
solely for the addressee or its representative and may contain
company proprietary information. If you are not the intended
recipient, notify the sender immediately and delete this
message. Publication, reproduction, forwarding, or content
disclosure is prohibited without the consent of the original
sender and may be unlawful.
Concurrent Technologies Corporation and its Affiliates.
www.ctc.com 1-800-282-4392
-----------------------------------------------------------------