Re: optional int and unparse formatting

Theodore Toth Mon, 30 Aug 2021 21:22:05 -0700

The following worked for me although I don't know if it's the 'right'
way to do it. Reading the spec can give you a headache.


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema";
    xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";>

  <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/";>
      <dfdl:format ref="default-dfdl-properties" />
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="FOO"
              dfdl:initiator="FOO/"
              dfdl:lengthKind="implicit"
              dfdl:terminator="%NL;%WSP*;">

    <xs:complexType>
      <xs:sequence dfdl:sequenceKind="ordered"
                   dfdl:separator="/"
                   dfdl:separatorPosition="infix">

        <xs:element name="elem1">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:minLength value="1"/>
              <xs:maxLength value="14"/>
              <xs:pattern value="[A-Z0-9,:%#*\- ]+"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:element>

        <xs:element name="elem2">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:pattern value="CAT|DOG|HORSE"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:element>

        <xs:element name="elem3" dfdl:textNumberPattern="#0000">
          <xs:simpleType>
            <xs:restriction base="xs:int">
              <xs:minInclusive value="1"/>
              <xs:maxInclusive value="99999"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:element>

        <xs:element name="elem4" minOccurs="0" maxOccurs="1">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:minLength value="1"/>
              <xs:maxLength value="20"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:element>

        <xs:sequence dfdl:separator="/" dfdl:terminator="/"
                     dfdl:separatorSuppressionPolicy="anyEmpty">
          <xs:element name="elem5" minOccurs="0" maxOccurs="1"
                      dfdl:textNumberPattern="000">
            <xs:simpleType>
              <xs:restriction base="xs:int">
                <xs:minInclusive value="1"/>
                <xs:maxInclusive value="999"/>
              </xs:restriction>
            </xs:simpleType>
          </xs:element>
        </xs:sequence>

      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

On Tue, Aug 31, 2021 at 9:31 AM Theodore Toth
<[email protected]> wrote:
>
> Thanks for the response.
>
> On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike
> <[email protected]> wrote:
> >
> > Good question.
> >
> > I think what is happening is this. elem5 fails to parse because it is an 
> > empty string, but then the parse backtracks, and here's the trick: that 
> > means it is putting back the separator before this array/optional element. 
> > Then your schema has nothing to absorb the final separator.
> >
> > Your schema has expressed an optional element, but what you want is a 
> > required separator, then an optional element after it.
> >
> > I think wrapping an xs:sequence around elem5 will fix this.
>
> So the required separator goes on the sequence?
>
> >
> > To be sure, I need to see the occursCountKind property, lengthKind 
> > property, etc. Basically I need to be able to reproduce your run.
> > I would need your default-dfdl-properties/defaults.dfdl.xsd file.
> >
> Here's my defaults that I pulled from the DFDL-part1 presentation:
>
> ?xml version="1.0" encoding="UTF-8"?>
>
> <schema xmlns="http://www.w3.org/2001/XMLSchema";
>         xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
>         xmlns:xs="http://www.w3.org/2001/XMLSchema";>
>
>   <xs:annotation>
>     <xs:appinfo source="http://www.ogf.org/dfdl/";>
>       <dfdl:defineFormat name="default-dfdl-properties">
>         <dfdl:format
>             alignment="1"
>             alignmentUnits="bytes"
>             binaryFloatRep="ieee"
>             binaryNumberRep="binary"
>             bitOrder="mostSignificantBitFirst"
>             byteOrder="bigEndian"
>             calendarPatternKind="implicit"
>             documentFinalTerminatorCanBeMissing="yes"
>             emptyValueDelimiterPolicy="none"
>             encoding="ISO-8859-1"
>             encodingErrorPolicy="replace"
>             escapeSchemeRef=""
>             fillByte="f"
>             floating="no"
>             ignoreCase="no"
>             initiator=""
>             initiatedContent="no"
>             leadingSkip="0"
>             lengthKind="delimited"
>             lengthUnits="characters"
>             nilKind="literalValue"
>             nilValueDelimiterPolicy="none"
>             occursCountKind="implicit"
>             outputNewLine="%CR;%LF;"
>             representation="text"
>             separator=""
>             separatorPosition="infix"
>             separatorSuppressionPolicy="never"
>             sequenceKind="ordered"
>             terminator=""
>             textBidi="no"
>             textNumberCheckPolicy="strict"
>             textNumberPattern="#,##0.###;-#,##0.###"
>             textNumberRep="standard"
>             textNumberRounding="explicit"
>             textNumberRoundingIncrement="0"
>             textNumberRoundingMode="roundUnnecessary"
>             textOutputMinLength="0"
>             textPadKind="none"
>             textStandardBase="10"
>             textStandardExponentRep="E"
>             textStandardInfinityRep="Inf"
>             textStandardNaNRep="NaN"
>             textStandardZeroRep="0"
>             textStandardDecimalSeparator="."
>             textStandardGroupingSeparator=","
>             textTrimKind="none"
>             trailingSkip="0"
>             truncateSpecifiedLengthString="no"
>             utf16Width="fixed"/>
>           </dfdl:defineFormat>
>         </xs:appinfo>
>       </xs:annotation>
>     </schema>
>
>
> > w.r.t your 0001 issue....
> >
> > The ability to control text number formats like leading zeros, is by way of 
> > the dfdl:textNumberPattern property. I think you want different values for 
> > this property for your two integer-type elements if they are supposed to 
> > have different numbers of digits, as evidenced by their max values of 999 
> > and 99999.
> >
> > However, your request that 0001 be preserved is not consistent with either 
> > 999 nor 99999 as max values. So I'm not sure what you are trying to achieve 
> > in this format.
>
> Just trying to teach an old dog some new tricks.
>
> >
> > DFDL does not "remember how the integer was presented". It parses it 
> > according to rules, creates an xs:int in the infoset, and at that point the 
> > leading zero information is gone. It then unparses according to rules. If 
> > you want 0001 to parse and unparse as 0001, you want 
> > dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a 
> > fifth if needed, but will always produce 4.
> >
> > But in this case, if you are first parsing, then unparsing data, then 
> > incoming "01" will also unparse as "0001". Using 
> > dfdl:textNumberPattern="#0000" means "canonical form for this data is at 
> > least 4 digits". If you parse the data using dfdl:lengthKind='delimited', 
> > then your schema has expressed "tolerate any number of digits, but always 
> > canonicalize to at least 4 digits".
>
> I'll play with this.
>
> >
> > If you want the text of these numbers preserved, not canonicalized, and 
> > your application does both parse and unparse, like data security apps often 
> > do, then you need to use strings, not numbers.
>
> If I were to use strings how would I then validate that the value was
> in some range?
>
> >
> > Note, however, that preserving leading/trailing non-numerically significant 
> > zeros is a security hole - they can be used to carry covert channel data.
> > Canonicalization of data is fundamentally more secure.
> >
> > The usual reason people want preservation of data exactly, character for 
> > character, is to make test/QA easier. That's ok so long as you get that 
> > there is a loss of some data security when non-information-carrying things 
> > like leading/trailing zeros are preserved.
> >
> >
> >
> > ________________________________
> > From: Theodore Toth <[email protected]>
> > Sent: Sunday, August 29, 2021 2:45 AM
> > To: [email protected] <[email protected]>
> > Subject: optional int and unparse formatting
> >
> > I just started looking at daffodil and have a few questions about my
> > first experiment:
> > Here's my dfdl:
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <xs:schema
> >     xmlns:xs="http://www.w3.org/2001/XMLSchema";
> >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";>
> >
> >   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
> >   <xs:annotation>
> >     <xs:appinfo source="http://www.ogf.org/dfdl/";>
> >       <dfdl:format ref="default-dfdl-properties" />
> >     </xs:appinfo>
> >   </xs:annotation>
> >
> >   <xs:element name="FOO"
> >               dfdl:initiator="FOO/"
> >               dfdl:lengthKind="implicit">
> > <!--
> >               dfdl:terminator="//%NL;%WSP*;">
> > -->
> >     <xs:complexType>
> >       <xs:sequence dfdl:sequenceKind="ordered"
> >                    dfdl:separator="/"
> >                    dfdl:separatorPosition="infix">
> >
> >         <xs:element name="elem1">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:string">
> >               <xs:minLength value="1"/>
> >               <xs:maxLength value="14"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem2">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:string">
> >               <xs:pattern value="CAT|DOG|HORSE"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem3">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:int">
> >               <xs:minInclusive value="1"/>
> >               <xs:maxInclusive value="99999"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:string">
> >               <xs:minLength value="1"/>
> >               <xs:maxLength value="20"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem5" minOccurs="0" maxOccurs="1">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:int">
> >               <xs:minInclusive value="1"/>
> >               <xs:maxInclusive value="999"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >       </xs:sequence>
> >     </xs:complexType>
> >   </xs:element>
> >
> > </xs:schema>
> >
> > Here's some test data:
> > FOO/GONE FISHIN/DOG/0001///
> >
> > The parse fails with:
> > [error] Parse Error: Unable to parse xs:int from empty string
> > Schema context: elem5 Location line 59 column 10 in
> > file:/home/tedx/dfdl-test/test.dfdl.xsd
> > Data location was preceding byte 26
> >
> > Why does it fail when elem5 has minOccurs="0"? elem5 is optional.
> >
> > Then if I put a 0 before the last slash it generates:
> > <?xml version="1.0" encoding="UTF-8"?>
> > <FOO>
> >   <elem1>GONE FISHIN</elem1>
> >   <elem2>DOG</elem2>
> >   <elem3>1</elem3>
> >   <elem4></elem4>
> >   <elem5>0</elem5>
> > </FOO>
> >
> > and when I unparse it generates:
> > FOO/GONE FISHIN/DOG/1//0
> >
> > but I'd like it to output 0001 for elem3, how do I do that?
> >
> > Ted

Re: optional int and unparse formatting

Reply via email to