This is really great roger. You've zeroed in on a set of very workable
design patterns that are great for MTF and all its variations.

What I particularly like is the separation of well-formed data (getting
where and how long the element is) from validity (getting values right).

At the top you say "If you don't have an XSD create one".

I suggest include advice to create a "very simple XSD" that sticks with a
quite minimal subset of XSD features.

Because there's lots of XSD stuff DFDL doesn't have:

* No attributes
* Only elements can have array or optional nature: maxOccurs or minOccurs >
1 or < 1.
* subset of the XSD simple types, subset of facets depending on that type
* No "all" groups
* No wildcards
* No complex type derivation
* No substitution groups
* No list types
* No key/unique constraints
* Restrictions on union types - all must have same base type
* Daffodil (not DFDL) restricts use of multiple child element declarations
having the same name.
* .... there's a few more, but that's most of it.




On Wed, Aug 10, 2022 at 2:30 PM Roger L Costello <[email protected]> wrote:

> Hi Folks,
>
>
>
> A lot of complexity got replaced with simplicity, thanks to Mike and Steve.
>
>
>
> Here’s the updated information. Lots of changes. If you find any errors,
> let me know.  /Roger
>
> ------------------------------------------
>
> Daffodil now supports the -V limited option. The -V limited option is a
> game changer. It totally changes the strategy for creating DFDL schemas.
> You use less DFDL properties and more XSD facets. This is huge!
>
>
>
> That said, what I am about to describe may or may not fit your DFDL work.
>
>
>
> For my work, there already exists an XML Schema (XSD). [If your work
> doesn’t already have an XSD, then create one!] The XSD is scaffolding and
> all I must do is add the appropriate DFDL properties to the scaffolding.
> All the leaf elements in the XSD are of type xs:string and are constrained
> using pattern or enumeration facets. Some data fields are nillable and so
> their corresponding XSD element declarations have nillable="true". Others
> are non-nillable. Some data fields have fixed length. Others have variable
> length. This message shows how to add appropriate DFDL properties to each
> type of leaf element.
>
>
>
> Before doing so, however, let’s see how the -V limited option changes
> DFDL schema development. Prior to the availability of the -V limited
> option I was using dfdl:lengthPattern="*regex*" to specify leaf elements.
> As a result, I had to:
>
>    - Convert each enumeration list in the XSD to a regex, where the
>    enumeration values became regex alternatives. Then I would sort the
>    alternatives longest-to-shortest. For fixed fields I would pad the
>    alternatives that weren’t of the required length. And then I would set the
>    sorted, padded regex as the value of dfdl:lengthPattern. Now, with the
>    -V limited option I leave the enumeration list as it is. I ditched
>    dfdl:lengthPattern. It’s not needed anymore.
>    - Convert pattern facets in the XSD to a single regex containing
>    alternatives. Then sort the alternatives longest-to-shortest. For fixed
>    fields I would pad the alternatives that weren’t of the required length.
>    And then I would set the sorted, padded regex as the value of
>    dfdl:lengthPattern. With the -V limited option, I no longer process
>    the pattern facet, I use it as is.
>
>
>
> The -V limited option means greater use of XSD facets and less need for
> DFDL properties. It means less processing: no more converting enumeration
> values into regex alternatives, no more converting pattern facets into
> regex alternatives, no more sorting regex alternatives in
> longest-to-shortest order, and for fixed fields no more padding
> alternatives.
>
>
>
> *Here is the Desired Parsing Behavior*: If data is well-formed and valid,
> I want parsing to produce XML and display no errors. If data is well-formed
> but not valid, I want parsing to produce XML and display errors. If data is
> not well-formed, I want parsing to not produce XML and display errors.
>
>
>
> I use the Daffodil -V limited option, as it results in the desired
> parsing behavior.
>
>
>
> As I said above, in my XSD the leaf elements are nillable or not, fixed
> length or not. In other words, there four types of data fields:
>
>
>
> 1. Data field is fixed length, nillable
>
>
>
> The following element declaration shows how to specify fixed length,
> nillable fields.
>
>
>
> Field specification:
>
> >>  Fixed length (3)
>
> >>  Nillable, hyphen is the nil value, the hyphen may be positioned
> anywhere within the 3-character field
>
> >>  Values must be left-justified
>
> >>  Values shorter than 3 characters must be padded with spaces
>
>
>
> <xs:element name="RunwayStatus"
>
>               nillable="true"
>
>                 dfdl:nilKind="literalValue"
>
>               dfdl:nilValue="%WSP*;-"
>
>                  dfdl:lengthKind="explicit"
>
>               dfdl:length="3"
>
>            dfdl:textTrimKind="padChar"
>
>             dfdl:textPadKind="padChar"
>
>       dfdl:textStringPadCharacter="%SP;"
>
>     dfdl:textStringJustification="left">
>
>     <xs:simpleType>
>
>         <xs:restriction base="xs:string">
>
>             <xs:enumeration value="FLT"/>
>
>             <xs:enumeration value="GVL"/>
>
>             <xs:enumeration value="BRK"/>
>
>             <xs:enumeration value="GDD"/>
>
>         </xs:restriction>
>
>     </xs:simpleType>
>
> </xs:element>
>
>
>
> In this case all the enumeration values are of the required length (3).
> Suppose some were shorter, would you need to pad them with spaces? No,
> there is no need to pad enumeration values. The combination of dfdl:length="3"
> and dfdl:textStringPadCharacter="%SP;" means that parsing will check that
> the input field has length 3 and if it contains a value that is shorter
> than 3 it is padded on the right with spaces. The 
> dfdl:textStringJustification="left"
> property specifies that values must be left-justified. Which means, this
> input is okay:
>
>
>
> …/AB /…
>
>
>
> but this is not:
>
>
>
> …/ AB/…
>
>
>
> If there is no input data available to populate the field, a hyphen is to
> be inserted. In other words, hyphen is the nil value. Of course, even
> with a nil value the field is still required to have length 3, so the
> hyphen must be padded with spaces. dfdl:nilValue="%WSP*;-" specifies that
> the hyphen may be positioned anywhere within the 3-character field.
>
>
>
> Let’s see how a DFDL processor parses the element. With the following input
> (note the spaces around the hyphen):
>
>
>
> …/ - /…
>
>
>
> parsing produces this output:
>
>
>
> <RunwayStatus xsi:nil="true"></RunwayStatus>
>
>
>
> and unparsing produces this output:
>
>
>
> …/-  /…
>
>
>
> Notice that unparsing results in moving the hyphen to the left side of the
> field.
>
>
>
> With this input:
>
>
>
> …/FLT/…
>
>
>
> parsing produces this output:
>
>
>
> <RunwayStatus>FLT</RunwayStatus>
>
>
>
> and unparsing produces this output:
>
>
>
> …/FLT/…
>
>
>
> If a pattern facet had been used instead of the enumeration facet:
>
>
>
> <xs:simpleType>
>
>     <xs:restriction base="xs:string">
>
>         <xs:pattern value="FLT|GVL|BRK|GDD" />
>
>     </xs:restriction>
>
> </xs:simpleType>
>
>
>
> everything works the same. That is, the same set of DFDL properties are
> used.
>
>
>
> 2. Data field is fixed length, non-nillable
>
>
>
> The following element declaration shows how to specify fixed length,
> non-nillable fields.
>
>
>
> Field specification:
>
> >>  Fixed length (6)
>
> >>  Values must be left-justified
>
> >>  Values shorter than 6 characters must be padded with spaces
>
>
>
> <xs:element name="TimeLabel"
>
>                  dfdl:lengthKind="explicit"
>
>               dfdl:length="6"
>
>            dfdl:textTrimKind="padChar"
>
>             dfdl:textPadKind="padChar"
>
>       dfdl:textStringPadCharacter="%SP;"
>
>     dfdl:textStringJustification="left">
>
>     <xs:simpleType>
>
>         <xs:restriction base="xs:string">
>
>             <xs:enumeration value="JUPT"/>
>
>             <xs:enumeration value="VENUSS"/>
>
>             <xs:enumeration value="MARSSS"/>
>
>             <xs:enumeration value="SUNNYY"/>
>
>             <xs:enumeration value="EAR"/>
>
>         </xs:restriction>
>
>     </xs:simpleType>
>
> </xs:element>
>
>
>
> Notice that some of the enumeration values have a length less than the
> required length (6). For example, EAR has a length of only 3. Does that
> mean we need to pad those values with length less than 6? No, there is no
> need to pad any enumeration value. The combination of dfdl:length="6" and 
> dfdl:textStringPadCharacter="%SP;"
> means that parsing will check the input field to see that it has length 6
> and if it contains a value that is shorter than 6, check that it is padded
> on the right with spaces. The dfdl:textStringJustification="left"
> property specifies that values must be left-justified. In other words,
> this input is okay:
>
>
>
> …/EAR   /…
>
>
>
> but this is not:
>
>
>
> …/   EAR/…
>
>
>
> Let’s see how a DFDL processor parses the element. With the following
> input (notice the value is less 4 characters, so it is padded with 2
> spaces):
>
>
>
> …/JUPT  /…
>
>
>
> parsing produces this output:
>
>
>
> <TimeLabel>JUPT</TimeLabel>
>
>
>
> and unparsing produces this output:
>
>
>
> …/JUPT  /…
>
>
>
> In our example, the enumeration facet is used. If a pattern facet had been
> used instead of the enumeration facet:
>
>
>
> <xs:simpleType>
>
>     <xs:restriction base="xs:string">
>
>         <xs:pattern value="JUPT|VENUSS|MARSSS|SUNNYY|EAR" />
>
>     </xs:restriction>
>
> </xs:simpleType>
>
>
>
> everything works the same. That is, the same set of DFDL properties are
> used.
>
>
>
> 3. Data field is variable length, nillable
>
>
>
> The following element declaration shows how to specify variable length,
> nillable fields.
>
>
>
> Field specification:
>
> >>  Variable length (2-20 characters)
>
> >>  Nillable, hyphen is the nil value, if a hyphen is present, it is the
> only character in the field
>
>
>
> <xs:element name="MessageID"
>
>               nillable="true"
>
>                 dfdl:nilKind="literalValue"
>
>               dfdl:nilValue="-">
>
>     <xs:simpleType>
>
>         <xs:restriction base="xs:string">
>
>             <xs:pattern value="[A-Z0-9 ]{2,20}"></xs:pattern>
>
>         </xs:restriction>
>
>     </xs:simpleType>
>
> </xs:element>
>
>
>
> Let’s see how a DFDL processor parses the element. With this input:
>
>
>
> …/-/…
>
>
>
> parsing produces this output:
>
>
>
> <MessageID xsi:nil="true"></MessageID>
>
>
>
> and unparsing produces this output:
>
>
>
> …/-/…
>
>
>
> With this input:
>
>
>
> …/XRAY/…
>
>
>
> parsing produces this output:
>
>
>
> <MessageID>XRAY</MessageID>
>
>
>
> and unparsing produces this output:
>
>
>
> …/XRAY/…
>
>
>
>
>
> 4. Data field is variable length, non-nillable
>
>
>
> The following element declaration shows how to specify variable length,
> non-nillable fields.
>
>
>
> Field specification:
>
> >>  Variable length (1-7 characters)
>
>
>
> <xs:element name="MessageNumber">
>
>     <xs:simpleType>
>
>         <xs:restriction base="xs:string">
>
>             <xs:pattern value="[A-Z0-9 ]{1,7}" />
>
>         </xs:restriction>
>
>     </xs:simpleType>
>
> </xs:element>
>
>
>
> Let’s see how a DFDL processor parses the element. With this input:
>
>
>
> …/BRAVO/…
>
>
>
> parsing produces this output:
>
>
>
> <MessageNumber>BRAVO</MessageNumber>
>
>
>
> and unparsing produces this output:
>
>
>
> …/BRAVO/…
>
>
>
> The following table shows how to assign XSD and DFDL properties. The nil
> values and length values shown in the table are from the above examples. 
> Obviously
> for your data you need to replace them with your values.
>
>
>
> *Properties to add onto the XSD element declaration*
>
> *Data Field:*
>
> *fixed length,*
>
> *nillable*
>
> *Data Field:*
>
> *fixed length,*
>
> *non-nillable*
>
> *Data Field:*
>
> *variable length,*
>
> *nillable*
>
> *Data Field:*
>
> *variable length,*
>
> *non-nillable*
>
> nillable
>
> true
>
> n/a
>
> true
>
> n/a
>
> dfdl:nilKind
>
> literalValue
>
> n/a
>
> literalValue
>
> n/a
>
> dfdl:nilValue
>
> %WSP*;-
>
> n/a
>
> -
>
> n/a
>
> dfdl:lengthKind
>
> explicit
>
> explicit
>
> delimited
>
> delimited
>
> dfdl:length
>
> 3
>
> 6
>
> n/a
>
> n/a
>
> dfdl:textTrimKind
>
> padChar
>
> padChar
>
> n/a
>
> n/a
>
> dfdl:textPadKind
>
> padChar
>
> padChar
>
> n/a
>
> n/a
>
> dfdl:textStringPadCharacter
>
> %SP;
>
> %SP;
>
> n/a
>
> n/a
>
> dfdl:textStringJustification
>
> left
>
> left
>
> n/a
>
> n/a
>
>
>
> It should be possible to convert this table into a form that can be used
> to automate the adding of DFDL properties onto element declarations.
>
>
>

Reply via email to