This is really great roger. You've zeroed in on a set of very workable design patterns that are great for MTF and all its variations.
What I particularly like is the separation of well-formed data (getting where and how long the element is) from validity (getting values right). At the top you say "If you don't have an XSD create one". I suggest include advice to create a "very simple XSD" that sticks with a quite minimal subset of XSD features. Because there's lots of XSD stuff DFDL doesn't have: * No attributes * Only elements can have array or optional nature: maxOccurs or minOccurs > 1 or < 1. * subset of the XSD simple types, subset of facets depending on that type * No "all" groups * No wildcards * No complex type derivation * No substitution groups * No list types * No key/unique constraints * Restrictions on union types - all must have same base type * Daffodil (not DFDL) restricts use of multiple child element declarations having the same name. * .... there's a few more, but that's most of it. On Wed, Aug 10, 2022 at 2:30 PM Roger L Costello <[email protected]> wrote: > Hi Folks, > > > > A lot of complexity got replaced with simplicity, thanks to Mike and Steve. > > > > Here’s the updated information. Lots of changes. If you find any errors, > let me know. /Roger > > ------------------------------------------ > > Daffodil now supports the -V limited option. The -V limited option is a > game changer. It totally changes the strategy for creating DFDL schemas. > You use less DFDL properties and more XSD facets. This is huge! > > > > That said, what I am about to describe may or may not fit your DFDL work. > > > > For my work, there already exists an XML Schema (XSD). [If your work > doesn’t already have an XSD, then create one!] The XSD is scaffolding and > all I must do is add the appropriate DFDL properties to the scaffolding. > All the leaf elements in the XSD are of type xs:string and are constrained > using pattern or enumeration facets. Some data fields are nillable and so > their corresponding XSD element declarations have nillable="true". Others > are non-nillable. Some data fields have fixed length. Others have variable > length. This message shows how to add appropriate DFDL properties to each > type of leaf element. > > > > Before doing so, however, let’s see how the -V limited option changes > DFDL schema development. Prior to the availability of the -V limited > option I was using dfdl:lengthPattern="*regex*" to specify leaf elements. > As a result, I had to: > > - Convert each enumeration list in the XSD to a regex, where the > enumeration values became regex alternatives. Then I would sort the > alternatives longest-to-shortest. For fixed fields I would pad the > alternatives that weren’t of the required length. And then I would set the > sorted, padded regex as the value of dfdl:lengthPattern. Now, with the > -V limited option I leave the enumeration list as it is. I ditched > dfdl:lengthPattern. It’s not needed anymore. > - Convert pattern facets in the XSD to a single regex containing > alternatives. Then sort the alternatives longest-to-shortest. For fixed > fields I would pad the alternatives that weren’t of the required length. > And then I would set the sorted, padded regex as the value of > dfdl:lengthPattern. With the -V limited option, I no longer process > the pattern facet, I use it as is. > > > > The -V limited option means greater use of XSD facets and less need for > DFDL properties. It means less processing: no more converting enumeration > values into regex alternatives, no more converting pattern facets into > regex alternatives, no more sorting regex alternatives in > longest-to-shortest order, and for fixed fields no more padding > alternatives. > > > > *Here is the Desired Parsing Behavior*: If data is well-formed and valid, > I want parsing to produce XML and display no errors. If data is well-formed > but not valid, I want parsing to produce XML and display errors. If data is > not well-formed, I want parsing to not produce XML and display errors. > > > > I use the Daffodil -V limited option, as it results in the desired > parsing behavior. > > > > As I said above, in my XSD the leaf elements are nillable or not, fixed > length or not. In other words, there four types of data fields: > > > > 1. Data field is fixed length, nillable > > > > The following element declaration shows how to specify fixed length, > nillable fields. > > > > Field specification: > > >> Fixed length (3) > > >> Nillable, hyphen is the nil value, the hyphen may be positioned > anywhere within the 3-character field > > >> Values must be left-justified > > >> Values shorter than 3 characters must be padded with spaces > > > > <xs:element name="RunwayStatus" > > nillable="true" > > dfdl:nilKind="literalValue" > > dfdl:nilValue="%WSP*;-" > > dfdl:lengthKind="explicit" > > dfdl:length="3" > > dfdl:textTrimKind="padChar" > > dfdl:textPadKind="padChar" > > dfdl:textStringPadCharacter="%SP;" > > dfdl:textStringJustification="left"> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:enumeration value="FLT"/> > > <xs:enumeration value="GVL"/> > > <xs:enumeration value="BRK"/> > > <xs:enumeration value="GDD"/> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > > > In this case all the enumeration values are of the required length (3). > Suppose some were shorter, would you need to pad them with spaces? No, > there is no need to pad enumeration values. The combination of dfdl:length="3" > and dfdl:textStringPadCharacter="%SP;" means that parsing will check that > the input field has length 3 and if it contains a value that is shorter > than 3 it is padded on the right with spaces. The > dfdl:textStringJustification="left" > property specifies that values must be left-justified. Which means, this > input is okay: > > > > …/AB /… > > > > but this is not: > > > > …/ AB/… > > > > If there is no input data available to populate the field, a hyphen is to > be inserted. In other words, hyphen is the nil value. Of course, even > with a nil value the field is still required to have length 3, so the > hyphen must be padded with spaces. dfdl:nilValue="%WSP*;-" specifies that > the hyphen may be positioned anywhere within the 3-character field. > > > > Let’s see how a DFDL processor parses the element. With the following input > (note the spaces around the hyphen): > > > > …/ - /… > > > > parsing produces this output: > > > > <RunwayStatus xsi:nil="true"></RunwayStatus> > > > > and unparsing produces this output: > > > > …/- /… > > > > Notice that unparsing results in moving the hyphen to the left side of the > field. > > > > With this input: > > > > …/FLT/… > > > > parsing produces this output: > > > > <RunwayStatus>FLT</RunwayStatus> > > > > and unparsing produces this output: > > > > …/FLT/… > > > > If a pattern facet had been used instead of the enumeration facet: > > > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:pattern value="FLT|GVL|BRK|GDD" /> > > </xs:restriction> > > </xs:simpleType> > > > > everything works the same. That is, the same set of DFDL properties are > used. > > > > 2. Data field is fixed length, non-nillable > > > > The following element declaration shows how to specify fixed length, > non-nillable fields. > > > > Field specification: > > >> Fixed length (6) > > >> Values must be left-justified > > >> Values shorter than 6 characters must be padded with spaces > > > > <xs:element name="TimeLabel" > > dfdl:lengthKind="explicit" > > dfdl:length="6" > > dfdl:textTrimKind="padChar" > > dfdl:textPadKind="padChar" > > dfdl:textStringPadCharacter="%SP;" > > dfdl:textStringJustification="left"> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:enumeration value="JUPT"/> > > <xs:enumeration value="VENUSS"/> > > <xs:enumeration value="MARSSS"/> > > <xs:enumeration value="SUNNYY"/> > > <xs:enumeration value="EAR"/> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > > > Notice that some of the enumeration values have a length less than the > required length (6). For example, EAR has a length of only 3. Does that > mean we need to pad those values with length less than 6? No, there is no > need to pad any enumeration value. The combination of dfdl:length="6" and > dfdl:textStringPadCharacter="%SP;" > means that parsing will check the input field to see that it has length 6 > and if it contains a value that is shorter than 6, check that it is padded > on the right with spaces. The dfdl:textStringJustification="left" > property specifies that values must be left-justified. In other words, > this input is okay: > > > > …/EAR /… > > > > but this is not: > > > > …/ EAR/… > > > > Let’s see how a DFDL processor parses the element. With the following > input (notice the value is less 4 characters, so it is padded with 2 > spaces): > > > > …/JUPT /… > > > > parsing produces this output: > > > > <TimeLabel>JUPT</TimeLabel> > > > > and unparsing produces this output: > > > > …/JUPT /… > > > > In our example, the enumeration facet is used. If a pattern facet had been > used instead of the enumeration facet: > > > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:pattern value="JUPT|VENUSS|MARSSS|SUNNYY|EAR" /> > > </xs:restriction> > > </xs:simpleType> > > > > everything works the same. That is, the same set of DFDL properties are > used. > > > > 3. Data field is variable length, nillable > > > > The following element declaration shows how to specify variable length, > nillable fields. > > > > Field specification: > > >> Variable length (2-20 characters) > > >> Nillable, hyphen is the nil value, if a hyphen is present, it is the > only character in the field > > > > <xs:element name="MessageID" > > nillable="true" > > dfdl:nilKind="literalValue" > > dfdl:nilValue="-"> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:pattern value="[A-Z0-9 ]{2,20}"></xs:pattern> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > > > Let’s see how a DFDL processor parses the element. With this input: > > > > …/-/… > > > > parsing produces this output: > > > > <MessageID xsi:nil="true"></MessageID> > > > > and unparsing produces this output: > > > > …/-/… > > > > With this input: > > > > …/XRAY/… > > > > parsing produces this output: > > > > <MessageID>XRAY</MessageID> > > > > and unparsing produces this output: > > > > …/XRAY/… > > > > > > 4. Data field is variable length, non-nillable > > > > The following element declaration shows how to specify variable length, > non-nillable fields. > > > > Field specification: > > >> Variable length (1-7 characters) > > > > <xs:element name="MessageNumber"> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:pattern value="[A-Z0-9 ]{1,7}" /> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > > > Let’s see how a DFDL processor parses the element. With this input: > > > > …/BRAVO/… > > > > parsing produces this output: > > > > <MessageNumber>BRAVO</MessageNumber> > > > > and unparsing produces this output: > > > > …/BRAVO/… > > > > The following table shows how to assign XSD and DFDL properties. The nil > values and length values shown in the table are from the above examples. > Obviously > for your data you need to replace them with your values. > > > > *Properties to add onto the XSD element declaration* > > *Data Field:* > > *fixed length,* > > *nillable* > > *Data Field:* > > *fixed length,* > > *non-nillable* > > *Data Field:* > > *variable length,* > > *nillable* > > *Data Field:* > > *variable length,* > > *non-nillable* > > nillable > > true > > n/a > > true > > n/a > > dfdl:nilKind > > literalValue > > n/a > > literalValue > > n/a > > dfdl:nilValue > > %WSP*;- > > n/a > > - > > n/a > > dfdl:lengthKind > > explicit > > explicit > > delimited > > delimited > > dfdl:length > > 3 > > 6 > > n/a > > n/a > > dfdl:textTrimKind > > padChar > > padChar > > n/a > > n/a > > dfdl:textPadKind > > padChar > > padChar > > n/a > > n/a > > dfdl:textStringPadCharacter > > %SP; > > %SP; > > n/a > > n/a > > dfdl:textStringJustification > > left > > left > > n/a > > n/a > > > > It should be possible to convert this table into a form that can be used > to automate the adding of DFDL properties onto element declarations. > > >
