re: Steve's latest suggested approach, why not just use nilValue="%WSP*;-"
This does risk one thing which is that erroneous data might have a line-ending in this whitespace, but that's going to fail validation no matter what, and it eliminates the combinatorial explosion of everyplace the hyphen can appear with N spaces, as the width of the field increases. There is a proposed feature for DFDL v2 which is to add these new character class entities: %LSP; %LSP*; %LSP+; and %SP*; %SP+; These use LSP for "within a Line", i.e, spaces and tabs only, and the obvious variants of zero-or-more (SP*) or one-or-more spaces(SP+) only. This is JIRA ticket https://issues.apache.org/jira/browse/DAFFODIL-2720 to implement this feature. On Wed, Aug 10, 2022 at 8:25 AM Steve Lawrence <[email protected]> wrote: > I'm not sure I love this possibles solution, and it doesn't scale very > well, but what about something like this: > > <element name="field" type="xs:string" nillable="true" > dfdl:lengthKind="explicit" > dfdl:length="3" > dfdl:textStringJustification="left" > dfdl:textTrimKind="padChar" > dfdl:textPadKind="padChar" > dfdl:textStringPadCharacter="%SP;" > dfdl:nilKind="literalValue" > dfdl:nilValue="- %SP;- %SP;%SP;-" /> > > So the field is left-justified and right-padded with spaces. Left padded > spaces are not trimmed, so a field like " A " will show up in the > infoset with the left space and fail validation. And the nilValue is set > to all the combinations of the nil character preceded with a space. > > Like I said, this doesn't scale because you need N nilValues for a > string of length N. And this scala at all for delimited length fields > where you don't know the length of the field, unless you just add a > bunch of nilValues up to some size. > > If we had something like %SP*; (similar to how we have %WSP*;), then the > nilValue could just be "%SP*;-" and this would scale without issue, and > work for both fixed length and delimited length fields. I believe %SP*; > has come up in the past, so this might be another argument to added it. > > > On 8/10/22 7:54 AM, Roger L Costello wrote: > > Thanks Mike. I implemented your approach. It fails to detect invalid > input. Let > > me explain. > > > > Input specifications: > > > > * Fixed length field (3) > > * Nillable, hyphen is the nil value, the hyphen may be anywhere > within the 3 > > character field > > * Values must be left-justified > > > > Here are examples of valid inputs: > > > > …/AB /… > > > > …/ABC/… > > > > …/- /… > > > > .../ - /… > > > > …/ -/… > > > > Your solution permits this input (I tested it, Daffodil gives no error > or warning): > > > > …/ AB/… > > > > Notice that the value is right-justified. That is invalid. > > > > /Roger > > > > /Roger > > > > *From:* Mike Beckerle <[email protected]> > > *Sent:* Monday, August 8, 2022 3:58 PM > > *To:* [email protected] > > *Subject:* [EXT] Re: Conflicting requirements: fixed length field, > nillable, > > some enumeration values shorter than the required length > > > > So I think your requirements are this: > > > > * fixed length 5 > > > > * the hyphen nil indicator may have spaces around it > > > > * canonical form is left justified for "-" or any value. > > > > This is the best I could do. I had to surround the nillable element with > another > > element so as to get left-justification by way of filling of the unused > region > > of a complex type, with fillByte which is %SP;. > > > > If you want center justified hyphens for the nil case and left-justified > strings > > for the value case, then I think it's not possible to model this without > using > > separate elements for the nil and value. (That solution not shown here.) > > > > <*element *name*="Foo" > > *dfdl:length*="5" > > *dfdl:lengthKind*="explicit" > > *dfdl:terminator*="/" > > *dfdl:fillByte*="%SP;"* > > > /<!-- > > The above achieves canonical unparse > > as left-justified fixed length because > > the fillByte will be used to fill unused > > space on the right. > > > > This only works for fixed length left-justified data. > > If this was right-justified, this trick would not work. > > --> > > /<*complexType* > > > <*sequence* > > > /<!-- > > The below achieves trimming of spaces either side, > > but only when parsing. Nothing is added when unparsing. > > --> > > /<*element *name*="value" *nillable*="true" > > *dfdl:nilValue*="-" > > *dfdl:lengthKind*="delimited" > > *dfdl:textStringJustification*="center" > > *dfdl:textTrimKind*="padChar" > > *dfdl:textPadKind*="none"* > > > <*simpleType* > > > <*restriction *base*="xs:string"* > > > <*enumeration *value*="AB"*/> > > <*enumeration *value*="ABC"*/> > > </*restriction* > > > </*simpleType* > > > </*element* > > > </*sequence* > > > </*complexType* > > > </*element* > > > > > The TDML file I created for this has these tests in it showing that this > works: > > > > <parserTestCase name="foo1" root="Foo" model="s" roundTrip="onePass"> > > <document>- /</document> > > <infoset> > > <dfdlInfoset> > > <ex:Foo xmlns=""><value xsi:nil="true"/></ex:Foo> > > </dfdlInfoset> > > </infoset> > > </parserTestCase> > > > > <parserTestCase name="foo2" root="Foo" model="s" roundTrip="twoPass"> > > <document> - /</document> > > <infoset> > > <dfdlInfoset> > > <ex:Foo xmlns=""><value xsi:nil="true"/></ex:Foo> > > </dfdlInfoset> > > </infoset> > > </parserTestCase> > > > > <parserTestCase name="foo3" root="Foo" model="s" roundTrip="twoPass"> > > <document> AB /</document> > > <infoset> > > <dfdlInfoset> > > <ex:Foo xmlns=""><value>AB</value></ex:Foo> > > </dfdlInfoset> > > </infoset> > > </parserTestCase> > > > > <parserTestCase name="foo4" root="Foo" model="s" roundTrip="onePass"> > > <document>AB /</document> > > <infoset> > > <dfdlInfoset> > > <ex:Foo xmlns=""><value>AB</value></ex:Foo> > > </dfdlInfoset> > > </infoset> > > </parserTestCase> > > > > On Mon, Aug 8, 2022 at 10:22 AM Roger L Costello <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi Mike, > > > > I gave your suggested approach a try. It failed. > > > > With this input: > > > > …/AB /… > > > > it works. > > > > With this input: > > > > …/ - /… > > > > it fails, producing this error: > > > > [error] Validation Error: Foo failed facet checks due to: facet > > enumeration(s): AB|ABC > > > > Further, even if the approach were to work with this example where > the field > > length is 3, it would be an untenable approach for longer fixed > fields. For > > example, if the field length was 10, then the nilValue would need > something > > like 10-factorial whitespace-separated values. > > > > Do you have another suggested approach? > > > > /Roger > > > > *From:* Mike Beckerle <[email protected] <mailto: > [email protected]>> > > *Sent:* Monday, August 8, 2022 9:38 AM > > *To:* [email protected] <mailto:[email protected]> > > *Subject:* [EXT] Re: Conflicting requirements: fixed length field, > nillable, > > some enumeration values shorter than the required length > > > > I would try making the nilValue "%SP;-%SP; -". That is two separate > > possibilities for nilValue, one is space-hyphen-space, the other > just > > hyphen. (It's a whitespace-separated list of nil values tokens.) > > > > The first one will be used for unparsing. Both will be tried for > parsing. > > > > That along with justification left might work. > > > > On Mon, Aug 8, 2022 at 8:01 AM Roger L Costello <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi Folks, > > > > I have an input field that is fixed length (3). If there is no > data, the > > field is to be populated with a hyphen (of course, it must be > padded > > with spaces to the required length). The schema has a > simpleType with > > enumeration facets. Some enumeration values are less than the > required > > length. > > > > Here's how I specify the field: > > > > <xs:element name="Foo" > > nillable="true" > > dfdl:nilKind="literalValue" > > dfdl:nilValue="-" > > dfdl:lengthKind="explicit" > > dfdl:length="3" > > dfdl:textTrimKind="padChar" > > dfdl:textPadKind="padChar" > > dfdl:textStringPadCharacter="%SP;" > > dfdl:textStringJustification="center"> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:enumeration value="AB"/> > > <xs:enumeration value="ABC"/> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > > > Notice dfdl:textStringJustification="center" which is fine for > the > > nillable value (hyphen) but not for a regular value such as AB > which > > should be left justified. As the schema is, the input could > contain this > > (assume slash separators): > > > > .../ AB/... > > > > which is incorrect. > > > > So, there are conflicting requirements: the nillable value needs > > dfdl:textStringJustification="center" whereas the normal values > need > > dfdl:textStringJustification="left". What to do about this? > > > > /Roger > > > >
