Hi Mike,

When I parse I am using the -V limited option.

With this erroneous input value:

no06N-05912E

I get this very helpful error message on parsing:

[error] Validation Error: LatitudeDegrees failed facet checks due to: facet 
pattern(s): [0-9]{2}

Isn’t that equivalent to what you are recommending with checkConstraints? If 
so, I would like to make the argument that using the -V limited option is 
superior because there is less DFDL stuff that must be added to the XSD. Do you 
agree with that argument?

/Roger

From: Mike Beckerle <[email protected]>
Sent: Monday, September 19, 2022 4:39 PM
To: [email protected]
Subject: [EXT] Re: Here is my writeup of category #11: Field with variable 
length, nillable, composite, no choice

You can improve the ability to clearly reject malformed data, and not just 
accept correct data.

Consider:

"nodatSTnodataW"

I think the above will give a bunch of validation errors about the data after a 
*successful* parse. Pretty sure that's not your intent. You want this to fail. 
Your facet patterns aren't just about validating the data. Those patterns are 
really about well-formedness of the data. They are the only place requiring the 
numeric strings to even be digits for example.

To fix that, I think you want to add assertions with dfdl:checkConstraints so 
your pattern facets get checked and affect the parse.

Most convenient way is just with a common type def:

<simpleType name="validString">
    <annotation>
         <appinfo source="http://www.ogf.org/dfdl/";>
            <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
         </appinfo>
     </annotation>
    <restriction base="xs:string"/>
</simpleType>

Then use validString instead of plain xs:string everywhere. All your pattern 
facets will then be checked as part of this assert, which will be on every 
string.

Now here's what's a bit interesting....... normally I don't recommend 
checkConstraints(.) everywhere in schemas, but I think what we've learned is 
that's for typical schemas where numbers are converted from text to number, and 
date/time fields get converted to the date/time types.  Those type conversions 
enforce a lot of syntax rules on the data. If the data survives the conversion 
from text to type it is well formed. So you don't need checkConstraints() to be 
sure the data is well formed.

To get the same thing in your all-strings approach we really need to force the 
facet patterns to be checked during parsing per the above 
validString/checkConstraints() trick.

So my past advice not to use checkConstraints(.) everywhere really does depend 
on the facet patterns - are they validation, or are they about well-formedness 
of strings? If the latter, then you really do need to call checkConstraints() 
at parse time for those strings.

Minor added point: Your degrees regex allows latitudes of 91-99, longitudes of 
181-999, minutes (integer part) of 60-99. That might be ok if that's the 
de-facto data you need to handle, but you may also want to be tighter about 
that.





On Mon, Sep 19, 2022 at 12:52 PM Roger L Costello 
<[email protected]<mailto:[email protected]>> wrote:
Hi Folks,
I am jumping around in my writeups.
As always, please let me know of anything that is unclear.  /Roger
--------------------------------------------------------------------------------------
11. Variable length, nillable, composite, no choice

A composite field is one that is composed of parts. There is no separator 
between the parts. The parts may be fixed length or variable length. The parts 
are non-nillable, although the composite field itself may be nillable.
This section deals with composite fields containing parts that are variable 
length and the field is nillable.
We will create a DFDL schema for a “Location” field that has a latitude and 
longitude, separated by a dash. Here is a sample value:
2006N-05912E
That is one value with 7 parts:
The first two digits (20) represents a latitude in degrees.
The next two digits (06) represents the latitude in minutes.
The N indicates the latitude’s hemisphere.
The dash ( - ) separates the latitude values from the following longitude 
values.
The 059 represents the longitude in degrees.
The 12 represents the longitude in minutes.
The E represents the longitude hemisphere.
In other words, the location is latitude 20 degrees, 6 minutes North, longitude 
59 degrees, 12 minutes East.
Both the latitude minute and longitude minute are variable length are expressed 
as a two-digit integer or as a decimal value. If a decimal, there may be 1-4 
digits to the right of the decimal point. Here are Location values with minute 
parts (highlighted in yellow) that have decimal values:
4221.6N-71003.5W
4221.63N-71003.57W
4221.630N-71003.576W
4221.6300N-71003.5760W
Here is one more example of a valid Location value:
-
That value means: no data was available to populate the field.
To re-emphasize, Location is a variable length, nillable, composite field.
Here is an XML Schema declaration of Location, sans any DFDL properties (I 
highlighted in yellow the field name and part names):
<xs:element name="Location" nillable="true">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="LatitudeDegrees">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LatitudeMinutes">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{1}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{2}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{3}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{4}" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LatitudeHemisphere">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="N" />
                        <xs:enumeration value="S" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
           <xs:element name="Hyphen">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="-" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeDegrees">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{3}" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeMinutes">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{1}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{2}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{3}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{4}" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeHemisphere">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="E" />
                        <xs:enumeration value="W" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>
These parts have fixed length: LatitudeDegrees, LatitudeHemisphere, Hyphen, 
LongitudeDegrees, and LongitudeHemisphere.
These parts have variable length: LatitudeMinutes and LongitudeMinutes.
For the fixed length parts, add these two DFDL properties:
dfdl:lengthKind="explicit"
dfdl:length="__"
For example, LatitudeDegrees has a fixed length of 2. Here is its declaration, 
with the DFDL properties (in yellow) added:
<xs:element name="LatitudeDegrees"
                      dfdl:lengthKind="explicit"
                      dfdl:length="2">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="[0-9]{2}" />
        </xs:restriction>
    </xs:simpleType>
</xs:element>
Use the same strategy for the other fixed fields.
LatitudeMinutes is variable length. The part that follows it 
(LatitudeHemisphere) has a fixed length (its value is either N or S). To 
declare LatitudeMinutes, add these two DFDL properties:
dfdl:lengthKind="pattern"
dfdl:lengthPattern="regex"
In the regex use a lookahead pattern. Here is LatitudeMinutes, extended with 
the DFDL properties (in yellow):
<xs:element name="LatitudeMinutes"
                       dfdl:lengthKind="pattern"
                       dfdl:lengthPattern=".*?(?=(N|S))">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="[0-9]{2}"/>
            <xs:pattern value="[0-9]{2}\.[0-9]{1}"/>
            <xs:pattern value="[0-9]{2}\.[0-9]{2}"/>
            <xs:pattern value="[0-9]{2}\.[0-9]{3}"/>
            <xs:pattern value="[0-9]{2}\.[0-9]{4}"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>
Read that as: the content of LatitudeMinutes is the text up to, but not 
including N or S.
Use the same regex lookahead strategy for LongitudeMinutes.
As I stated earlier, Location is nillable with hyphen as the nil value. 
Further, Location has a complexType. That is a problem. See section 2 for a 
complete discussion of the problem with nillable complexTypes and how to deal 
with it.
Here’s the DFDL schema for the Location field (DFDL is shown in yellow):
<xs:element name="Location">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="LatitudeDegrees"
                                   dfdl:lengthKind="explicit"
                                   dfdl:length="2">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LatitudeMinutes"
                                   dfdl:lengthKind="pattern"
                                   dfdl:lengthPattern=".*?(?=(N|S))">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{1}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{2}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{3}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{4}" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LatitudeHemisphere"
                                   dfdl:lengthKind="explicit"
                                   dfdl:length="1">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="N" />
                        <xs:enumeration value="S" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="Hyphen"
                                  dfdl:lengthKind="explicit"
                                  dfdl:length="1">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="-" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeDegrees"
                                   dfdl:lengthKind="explicit"
                                   dfdl:length="3">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{3}" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeMinutes"
                                   dfdl:lengthKind="pattern"
                                   dfdl:lengthPattern=".*?(?=(E|W))">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{1}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{2}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{3}" />
                        <xs:pattern value="[0-9]{2}\.[0-9]{4}" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeHemisphere">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="E" />
                        <xs:enumeration value="W" />
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>
Notice that the last part (LongitudeHemisphere) has no DFDL added. This is 
because I am assuming that it is followed by the delimiter for the Location 
field.

Reply via email to