Re: Conflicting requirements: fixed length field, nillable, some enumeration values shorter than the required length

Roger L Costello Wed, 10 Aug 2022 07:36:22 -0700

  *   why not just use nilValue="%WSP*;-"

Fantastic!


I just tested that solution and it works perfect.

This is getting better and better. Now we have a super-simple solution to 
fixed-length, nillable fields. Write-up coming shortly.

/Roger

From: Mike Beckerle <[email protected]>
Sent: Wednesday, August 10, 2022 10:31 AM
To: [email protected]
Subject: [EXT] Re: Conflicting requirements: fixed length field, nillable, some 
enumeration values shorter than the required length

re: Steve's latest suggested approach, why not just use nilValue="%WSP*;-"

This does risk one thing which is that erroneous data might have a line-ending 
in this whitespace, but that's going to fail validation no matter what, and it 
eliminates the combinatorial explosion of everyplace the hyphen can appear with 
N spaces, as the width of the field increases.

There is a proposed feature for DFDL v2 which is to add these new character 
class entities: %LSP; %LSP*; %LSP+; and %SP*; %SP+; These use LSP for "within a 
Line", i.e, spaces and tabs only, and the obvious variants of zero-or-more 
(SP*) or one-or-more spaces(SP+) only.

This is JIRA ticket https://issues.apache.org/jira/browse/DAFFODIL-2720 to 
implement this feature.


On Wed, Aug 10, 2022 at 8:25 AM Steve Lawrence 
<[email protected]<mailto:[email protected]>> wrote:
I'm not sure I love this possibles solution, and it doesn't scale very
well, but what about something like this:

   <element name="field" type="xs:string" nillable="true"
     dfdl:lengthKind="explicit"
     dfdl:length="3"
     dfdl:textStringJustification="left"
     dfdl:textTrimKind="padChar"
     dfdl:textPadKind="padChar"
     dfdl:textStringPadCharacter="%SP;"
     dfdl:nilKind="literalValue"
     dfdl:nilValue="- %SP;- %SP;%SP;-" />

So the field is left-justified and right-padded with spaces. Left padded
spaces are not trimmed, so a field like " A " will show up in the
infoset with the left space and fail validation. And the nilValue is set
to all the combinations of the nil character preceded with a space.

Like I said, this doesn't scale because you need N nilValues for a
string of length N. And this scala at all for delimited length fields
where you don't know the length of the field, unless you just add a
bunch of nilValues up to some size.

If we had something like %SP*; (similar to how we have %WSP*;), then the
nilValue could just be "%SP*;-" and this would scale without issue, and
work for both fixed length and delimited length fields. I believe %SP*;
has come up in the past, so this might be another argument to added it.


On 8/10/22 7:54 AM, Roger L Costello wrote:
> Thanks Mike. I implemented your approach. It fails to detect invalid input. 
> Let
> me explain.
>
> Input specifications:
>
>    * Fixed length field (3)
>    * Nillable, hyphen is the nil value, the hyphen may be anywhere within the 
> 3
>      character field
>    * Values must be left-justified
>
> Here are examples of valid inputs:
>
> …/AB /…
>
> …/ABC/…
>
> …/-  /…
>
> .../ - /…
>
> …/  -/…
>
> Your solution permits this input (I tested it, Daffodil gives no error or 
> warning):
>
> …/ AB/…
>
> Notice that the value is right-justified. That is invalid.
>
> /Roger
>
> /Roger
>
> *From:* Mike Beckerle <[email protected]<mailto:[email protected]>>
> *Sent:* Monday, August 8, 2022 3:58 PM
> *To:* [email protected]<mailto:[email protected]>
> *Subject:* [EXT] Re: Conflicting requirements: fixed length field, nillable,
> some enumeration values shorter than the required length
>
> So I think your requirements are this:
>
> * fixed length 5
>
> * the hyphen nil indicator may have spaces around it
>
> * canonical form is left justified for "-" or any value.
>
> This is the best I could do. I had to surround the nillable element with 
> another
> element so as to get left-justification by way of filling of the unused region
> of a complex type, with fillByte which is %SP;.
>
> If you want center justified hyphens for the nil case and left-justified 
> strings
> for the value case, then I think it's not possible to model this without using
> separate elements for the nil and value. (That solution not shown here.)
>
> <*element *name*="Foo"
> *dfdl:length*="5"
> *dfdl:lengthKind*="explicit"
> *dfdl:terminator*="/"
> *dfdl:fillByte*="%SP;"* >
> /<!--
>     The above achieves canonical unparse
>     as left-justified fixed length because
>     the fillByte will be used to fill unused
>     space on the right.
>
>     This only works for fixed length left-justified data.
>     If this was right-justified, this trick would not work.
>     -->
> /<*complexType* >
>     <*sequence* >
> /<!--
>       The below achieves trimming of spaces either side,
>       but only when parsing. Nothing is added when unparsing.
>       -->
> /<*element *name*="value" *nillable*="true"
> *dfdl:nilValue*="-"
> *dfdl:lengthKind*="delimited"
> *dfdl:textStringJustification*="center"
> *dfdl:textTrimKind*="padChar"
> *dfdl:textPadKind*="none"* >
>         <*simpleType* >
>           <*restriction *base*="xs:string"* >
>             <*enumeration *value*="AB"*/>
>             <*enumeration *value*="ABC"*/>
>           </*restriction* >
>         </*simpleType* >
>       </*element* >
>       </*sequence* >
> </*complexType* >
> </*element* >
>
> The TDML file I created for this has these tests in it showing that this 
> works:
>
>     <parserTestCase name="foo1" root="Foo" model="s" roundTrip="onePass">
>       <document>-    /</document>
>       <infoset>
>         <dfdlInfoset>
>           <ex:Foo xmlns=""><value xsi:nil="true"/></ex:Foo>
>         </dfdlInfoset>
>       </infoset>
>     </parserTestCase>
>
>     <parserTestCase name="foo2" root="Foo" model="s" roundTrip="twoPass">
>       <document> -   /</document>
>       <infoset>
>         <dfdlInfoset>
>           <ex:Foo xmlns=""><value xsi:nil="true"/></ex:Foo>
>         </dfdlInfoset>
>       </infoset>
>     </parserTestCase>
>
>     <parserTestCase name="foo3" root="Foo" model="s" roundTrip="twoPass">
>       <document> AB  /</document>
>       <infoset>
>         <dfdlInfoset>
>           <ex:Foo xmlns=""><value>AB</value></ex:Foo>
>         </dfdlInfoset>
>       </infoset>
>     </parserTestCase>
>
>     <parserTestCase name="foo4" root="Foo" model="s" roundTrip="onePass">
>       <document>AB   /</document>
>       <infoset>
>         <dfdlInfoset>
>           <ex:Foo xmlns=""><value>AB</value></ex:Foo>
>         </dfdlInfoset>
>       </infoset>
>     </parserTestCase>
>
> On Mon, Aug 8, 2022 at 10:22 AM Roger L Costello 
> <[email protected]<mailto:[email protected]>
> <mailto:[email protected]<mailto:[email protected]>>> wrote:
>
>      Hi Mike,
>
>      I gave your suggested approach a try. It failed.
>
>      With this input:
>
>      …/AB /…
>
>      it works.
>
>      With this input:
>
>      …/ - /…
>
>      it fails, producing this error:
>
>      [error] Validation Error: Foo failed facet checks due to: facet
>      enumeration(s): AB|ABC
>
>      Further, even if the approach were to work with this example where the 
> field
>      length is 3, it would be an untenable approach for longer fixed fields. 
> For
>      example, if the field length was 10, then the nilValue would need 
> something
>      like 10-factorial whitespace-separated values.
>
>      Do you have another suggested approach?
>
>      /Roger
>
>      *From:* Mike Beckerle <[email protected]<mailto:[email protected]> 
> <mailto:[email protected]<mailto:[email protected]>>>
>      *Sent:* Monday, August 8, 2022 9:38 AM
>      *To:* [email protected]<mailto:[email protected]> 
> <mailto:[email protected]<mailto:[email protected]>>
>      *Subject:* [EXT] Re: Conflicting requirements: fixed length field, 
> nillable,
>      some enumeration values shorter than the required length
>
>      I would try making the nilValue "%SP;-%SP; -". That is two separate
>      possibilities for nilValue, one is space-hyphen-space, the other just
>      hyphen. (It's a whitespace-separated list of nil values tokens.)
>
>      The first one will be used for unparsing. Both will be tried for parsing.
>
>      That along with justification left might work.
>
>      On Mon, Aug 8, 2022 at 8:01 AM Roger L Costello 
> <[email protected]<mailto:[email protected]>
>      <mailto:[email protected]<mailto:[email protected]>>> wrote:
>
>          Hi Folks,
>
>          I have an input field that is fixed length (3). If there is no data, 
> the
>          field is to be populated with a hyphen (of course, it must be padded
>          with spaces to the required length). The schema has a simpleType with
>          enumeration facets. Some enumeration values are less than the 
> required
>          length.
>
>          Here's how I specify the field:
>
>          <xs:element name="Foo"
>               nillable="true"
>               dfdl:nilKind="literalValue"
>               dfdl:nilValue="-"
>               dfdl:lengthKind="explicit"
>               dfdl:length="3"
>               dfdl:textTrimKind="padChar"
>               dfdl:textPadKind="padChar"
>               dfdl:textStringPadCharacter="%SP;"
>               dfdl:textStringJustification="center">
>               <xs:simpleType>
>                   <xs:restriction base="xs:string">
>                       <xs:enumeration value="AB"/>
>                       <xs:enumeration value="ABC"/>
>                   </xs:restriction>
>               </xs:simpleType>
>          </xs:element>
>
>          Notice dfdl:textStringJustification="center" which is fine for the
>          nillable value (hyphen) but not for a regular value such as AB which
>          should be left justified. As the schema is, the input could contain 
> this
>          (assume slash separators):
>
>          .../ AB/...
>
>          which is incorrect.
>
>          So, there are conflicting requirements: the nillable value needs
>          dfdl:textStringJustification="center" whereas the normal values need
>          dfdl:textStringJustification="left". What to do about this?
>
>          /Roger
>

Re: Conflicting requirements: fixed length field, nillable, some enumeration values shorter than the required length

Reply via email to