Technically, it is the dfdl:assert that specifies something to check immediately after the element is successfully parsed. And in this case, the assert expressions happens to call the dfdl:checkConstraints function, which validates what was parsed against the schema restrictions.
But yes, that's the right idea. The assert/checkConstrints only happens immediately after the parse. It is the other dfdl properites, like dfdl:lengthKind, that are first used to determine how to parse that field. On 7/20/21 10:10 AM, Roger L Costello wrote: > Thanks again Steve. To confirm my understanding: dfdl:checkConstraints > specifies something to check *after parsing* has been performed. The DFDL > schema must specify *how to parse*, which is why we need to specify > dfdl:occursKind="pattern" and dfdl:pattern="...". Do I understand correctly? > > /Roger > > -----Original Message----- > From: Steve Lawrence <[email protected]> > Sent: Tuesday, July 20, 2021 9:49 AM > To: [email protected] > Subject: [EXT] Re: How to specify data with two fields, no delimiter, > variable length? > > The enumeration + checkConstraints approach doesn't give daffodil any > information about the length of the field. Those are only used to validate > the field *after* it has been parsed. > > So how is Daffodil determining the length of the field if you haven't > specified a length? My guess is since the schema compiles, that probably > means that your global dfdl:format has set lengthKind="delimited"--other > values would probably fail to compile since additional properties are > required. > > And with lengthKind="delimited" and no delimiters in scope, the length is > just all the data up until the end-of-file is reached. So your item1 is going > to be parsed as the entire contents of the file (including any newlines), > which will fail the enumeration constraint. > > So even if you add the enumartion + checkConstratins, you still need the > pattern length to tell Daffodil the length of the field (either of the ones I > mentioned should work). > > On 7/20/21 9:34 AM, Roger L Costello wrote: >> Thank you Steve. Terrific explanation. >> >> I tried the approach you described - dfdl:lengthKind="pattern" >> dfdl:lengthPattern="ABC|AB|AC|A" - and it worked great. >> >> I also tried using enumeration facets coupled with >> dfdl:checkConstraints within dfdl:assert >> >> <xs:element name="item1"> >> <xs:annotation> >> <xs:appinfo >> source="http://www.ogf.org/dfdl/"> >> <dfdl:assert >> test="{ dfdl:checkConstraints(.) }" >> message="The value of item1 is not one of the allowable >> values" >> /> >> </xs:appinfo> >> </xs:annotation> >> <xs:simpleType> >> <xs:restriction base="xs:string"> >> <xs:enumeration value="A" /> >> <xs:enumeration value="ABC" /> >> <xs:enumeration value="AB" /> >> <xs:enumeration value="AC" /> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> >> But that did not work. Why does that not work? >> >> /Roger >> >> -----Original Message----- >> From: Steve Lawrence <[email protected]> >> Sent: Monday, July 12, 2021 2:39 PM >> To: [email protected] >> Subject: [EXT] Re: How to specify data with two fields, no delimiter, >> variable length? >> >> In cases like these, you need to use dfdl:lengthKind="pattern" and a regular >> expression to define the length of the first item. >> >> There's lots of different regexs depending on what kinds of infosets you >> want to allow. >> >> For example, one approach for the first item is a very strict regex that >> matches exactly one of the four values, e.g. >> >> <xs:element name="item" type="xs:string" >> dfdl:lengthKind="pattern" dfdl:lengthPattern="ABC|AB|AC|A" /> >> >> With this approach, the item will get a non-zero length if it is one of >> those items. Otherwise the item will be the empty string. And if you don't >> want empty string to be allowed, you need to add an assert that the length >> is greater than zero. Also, note that order in the regex matters so it >> matches the longest possibility first. >> >> On the other end of the spectrum, you could instead model the first item to >> match as many non-digits as possible: >> >> <xs:element name="item" type="xs:string" >> dfdl:lengthKind="pattern" dfdl:lengthPattern="[^0-9]*" /> >> >> This will match any of the four allowed values, but will also match anything >> else up to the first digit. So this could potentially produce infosets with >> an item value of XYZ, for example. In some cases, you might actually want >> this--we might consider the data to be "well-formed" >> but not "valid". So you still get an infoset, it's just not "valid". >> Whereas in the first case, you could only get a valid infoset. >> >> You'll probably also need to use regex length for matching the numeric item >> if there's no delimiter after the number. >> >> So putting it together, and using the second approach for both items, you >> might do something like this: >> >> <xs:sequence> >> <xs:element name="item1 type="xs:string" >> dfdl:lengthKind="pattern" dfdl:lengthPattern="[^0-9]*" /> >> <xs:element name="item2" type="xs:int" >> dfdl:lengthKind="pattern" dfdl:lengthPattern="[0-9]*" /> >> </xs:sequence> >> >> So the first item is string parsing as many non-digits as possible, and the >> second is an int parsing as many digits as possible. Note that this approach >> probably should have limits on the regex length in case the data is >> bad/malformed. For example, if the data didn't contain numbers then item1 >> would just consume the entire data. So instead of *, you might instead want >> to use something like "{0,10}" for both regexes. >> >> - Steve >> >> On 7/12/21 2:05 PM, Roger L Costello wrote: >>> Hi Folks, >>> >>> I have a data field composed to two items. >>> >>> The values for the first item can be enumerated: >>> >>> A >>> ABC >>> AB >>> AC >>> >>> The values for the second item is any integer 0-999 >>> >>> So, here is a same data field: >>> >>> A250 >>> >>> How do I parse that using DFDL? I reckon I'm stuck. >>> >>> /Roger >>> >> >
