Re: Conflicting requirements: fixed length field, nillable, some enumeration values shorter than the required length

Mike Beckerle Wed, 10 Aug 2022 07:30:48 -0700

re: Steve's latest suggested approach, why not just use nilValue="%WSP*;-"


This does risk one thing which is that erroneous data might have a
line-ending in this whitespace, but that's going to fail validation no
matter what, and it eliminates the combinatorial explosion of everyplace
the hyphen can appear with N spaces, as the width of the field increases.

There is a proposed feature for DFDL v2 which is to add these new character
class entities: %LSP; %LSP*; %LSP+; and %SP*; %SP+; These use LSP for
"within a Line", i.e, spaces and tabs only, and the obvious variants of
zero-or-more (SP*) or one-or-more spaces(SP+) only.

This is JIRA ticket https://issues.apache.org/jira/browse/DAFFODIL-2720 to
implement this feature.


On Wed, Aug 10, 2022 at 8:25 AM Steve Lawrence <[email protected]> wrote:

> I'm not sure I love this possibles solution, and it doesn't scale very
> well, but what about something like this:
>
>    <element name="field" type="xs:string" nillable="true"
>      dfdl:lengthKind="explicit"
>      dfdl:length="3"
>      dfdl:textStringJustification="left"
>      dfdl:textTrimKind="padChar"
>      dfdl:textPadKind="padChar"
>      dfdl:textStringPadCharacter="%SP;"
>      dfdl:nilKind="literalValue"
>      dfdl:nilValue="- %SP;- %SP;%SP;-" />
>
> So the field is left-justified and right-padded with spaces. Left padded
> spaces are not trimmed, so a field like " A " will show up in the
> infoset with the left space and fail validation. And the nilValue is set
> to all the combinations of the nil character preceded with a space.
>
> Like I said, this doesn't scale because you need N nilValues for a
> string of length N. And this scala at all for delimited length fields
> where you don't know the length of the field, unless you just add a
> bunch of nilValues up to some size.
>
> If we had something like %SP*; (similar to how we have %WSP*;), then the
> nilValue could just be "%SP*;-" and this would scale without issue, and
> work for both fixed length and delimited length fields. I believe %SP*;
> has come up in the past, so this might be another argument to added it.
>
>
> On 8/10/22 7:54 AM, Roger L Costello wrote:
> > Thanks Mike. I implemented your approach. It fails to detect invalid
> input. Let
> > me explain.
> >
> > Input specifications:
> >
> >    * Fixed length field (3)
> >    * Nillable, hyphen is the nil value, the hyphen may be anywhere
> within the 3
> >      character field
> >    * Values must be left-justified
> >
> > Here are examples of valid inputs:
> >
> > …/AB /…
> >
> > …/ABC/…
> >
> > …/-  /…
> >
> > .../ - /…
> >
> > …/  -/…
> >
> > Your solution permits this input (I tested it, Daffodil gives no error
> or warning):
> >
> > …/ AB/…
> >
> > Notice that the value is right-justified. That is invalid.
> >
> > /Roger
> >
> > /Roger
> >
> > *From:* Mike Beckerle <[email protected]>
> > *Sent:* Monday, August 8, 2022 3:58 PM
> > *To:* [email protected]
> > *Subject:* [EXT] Re: Conflicting requirements: fixed length field,
> nillable,
> > some enumeration values shorter than the required length
> >
> > So I think your requirements are this:
> >
> > * fixed length 5
> >
> > * the hyphen nil indicator may have spaces around it
> >
> > * canonical form is left justified for "-" or any value.
> >
> > This is the best I could do. I had to surround the nillable element with
> another
> > element so as to get left-justification by way of filling of the unused
> region
> > of a complex type, with fillByte which is %SP;.
> >
> > If you want center justified hyphens for the nil case and left-justified
> strings
> > for the value case, then I think it's not possible to model this without
> using
> > separate elements for the nil and value. (That solution not shown here.)
> >
> > <*element *name*="Foo"
> > *dfdl:length*="5"
> > *dfdl:lengthKind*="explicit"
> > *dfdl:terminator*="/"
> > *dfdl:fillByte*="%SP;"* >
> > /<!--
> >     The above achieves canonical unparse
> >     as left-justified fixed length because
> >     the fillByte will be used to fill unused
> >     space on the right.
> >
> >     This only works for fixed length left-justified data.
> >     If this was right-justified, this trick would not work.
> >     -->
> > /<*complexType* >
> >     <*sequence* >
> > /<!--
> >       The below achieves trimming of spaces either side,
> >       but only when parsing. Nothing is added when unparsing.
> >       -->
> > /<*element *name*="value" *nillable*="true"
> > *dfdl:nilValue*="-"
> > *dfdl:lengthKind*="delimited"
> > *dfdl:textStringJustification*="center"
> > *dfdl:textTrimKind*="padChar"
> > *dfdl:textPadKind*="none"* >
> >         <*simpleType* >
> >           <*restriction *base*="xs:string"* >
> >             <*enumeration *value*="AB"*/>
> >             <*enumeration *value*="ABC"*/>
> >           </*restriction* >
> >         </*simpleType* >
> >       </*element* >
> >       </*sequence* >
> > </*complexType* >
> > </*element* >
> >
> > The TDML file I created for this has these tests in it showing that this
> works:
> >
> >     <parserTestCase name="foo1" root="Foo" model="s" roundTrip="onePass">
> >       <document>-    /</document>
> >       <infoset>
> >         <dfdlInfoset>
> >           <ex:Foo xmlns=""><value xsi:nil="true"/></ex:Foo>
> >         </dfdlInfoset>
> >       </infoset>
> >     </parserTestCase>
> >
> >     <parserTestCase name="foo2" root="Foo" model="s" roundTrip="twoPass">
> >       <document> -   /</document>
> >       <infoset>
> >         <dfdlInfoset>
> >           <ex:Foo xmlns=""><value xsi:nil="true"/></ex:Foo>
> >         </dfdlInfoset>
> >       </infoset>
> >     </parserTestCase>
> >
> >     <parserTestCase name="foo3" root="Foo" model="s" roundTrip="twoPass">
> >       <document> AB  /</document>
> >       <infoset>
> >         <dfdlInfoset>
> >           <ex:Foo xmlns=""><value>AB</value></ex:Foo>
> >         </dfdlInfoset>
> >       </infoset>
> >     </parserTestCase>
> >
> >     <parserTestCase name="foo4" root="Foo" model="s" roundTrip="onePass">
> >       <document>AB   /</document>
> >       <infoset>
> >         <dfdlInfoset>
> >           <ex:Foo xmlns=""><value>AB</value></ex:Foo>
> >         </dfdlInfoset>
> >       </infoset>
> >     </parserTestCase>
> >
> > On Mon, Aug 8, 2022 at 10:22 AM Roger L Costello <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >      Hi Mike,
> >
> >      I gave your suggested approach a try. It failed.
> >
> >      With this input:
> >
> >      …/AB /…
> >
> >      it works.
> >
> >      With this input:
> >
> >      …/ - /…
> >
> >      it fails, producing this error:
> >
> >      [error] Validation Error: Foo failed facet checks due to: facet
> >      enumeration(s): AB|ABC
> >
> >      Further, even if the approach were to work with this example where
> the field
> >      length is 3, it would be an untenable approach for longer fixed
> fields. For
> >      example, if the field length was 10, then the nilValue would need
> something
> >      like 10-factorial whitespace-separated values.
> >
> >      Do you have another suggested approach?
> >
> >      /Roger
> >
> >      *From:* Mike Beckerle <[email protected] <mailto:
> [email protected]>>
> >      *Sent:* Monday, August 8, 2022 9:38 AM
> >      *To:* [email protected] <mailto:[email protected]>
> >      *Subject:* [EXT] Re: Conflicting requirements: fixed length field,
> nillable,
> >      some enumeration values shorter than the required length
> >
> >      I would try making the nilValue "%SP;-%SP; -". That is two separate
> >      possibilities for nilValue, one is space-hyphen-space, the other
> just
> >      hyphen. (It's a whitespace-separated list of nil values tokens.)
> >
> >      The first one will be used for unparsing. Both will be tried for
> parsing.
> >
> >      That along with justification left might work.
> >
> >      On Mon, Aug 8, 2022 at 8:01 AM Roger L Costello <[email protected]
> >      <mailto:[email protected]>> wrote:
> >
> >          Hi Folks,
> >
> >          I have an input field that is fixed length (3). If there is no
> data, the
> >          field is to be populated with a hyphen (of course, it must be
> padded
> >          with spaces to the required length). The schema has a
> simpleType with
> >          enumeration facets. Some enumeration values are less than the
> required
> >          length.
> >
> >          Here's how I specify the field:
> >
> >          <xs:element name="Foo"
> >               nillable="true"
> >               dfdl:nilKind="literalValue"
> >               dfdl:nilValue="-"
> >               dfdl:lengthKind="explicit"
> >               dfdl:length="3"
> >               dfdl:textTrimKind="padChar"
> >               dfdl:textPadKind="padChar"
> >               dfdl:textStringPadCharacter="%SP;"
> >               dfdl:textStringJustification="center">
> >               <xs:simpleType>
> >                   <xs:restriction base="xs:string">
> >                       <xs:enumeration value="AB"/>
> >                       <xs:enumeration value="ABC"/>
> >                   </xs:restriction>
> >               </xs:simpleType>
> >          </xs:element>
> >
> >          Notice dfdl:textStringJustification="center" which is fine for
> the
> >          nillable value (hyphen) but not for a regular value such as AB
> which
> >          should be left justified. As the schema is, the input could
> contain this
> >          (assume slash separators):
> >
> >          .../ AB/...
> >
> >          which is incorrect.
> >
> >          So, there are conflicting requirements: the nillable value needs
> >          dfdl:textStringJustification="center" whereas the normal values
> need
> >          dfdl:textStringJustification="left". What to do about this?
> >
> >          /Roger
> >
>
>

Re: Conflicting requirements: fixed length field, nillable, some enumeration values shorter than the required length

Reply via email to