One case where the xs:string approach doesn't work is if your integer is allowed to contain commas or scientific notation. The xs:integer() and other XPath functions don't know how to handle these and will fail to convert them.

I think another case where you might want to use xs:int instead of xs:string is to control where a processing error occurs. Using a numeric type will cause a parse error when that field is parsed, whereas using a string will only cause an error when it is converted in an expression, which may never happen. So these would only be validation errors--there might be cases were you need a processing error to force backtracking. However, you could add an assert with dfdl:checkConstraints() to cause backtracking, but at that point you might as well just use xs:int and to keep the schema from becoming too verbose.

Also, diagnostics will be different when the field isn't valid, and maybe less helpful using the string approach. For example, with an int you would get an error like "number-of-names names was not a valid integer". But with the string approach you would get an error like "expression could not be converted to an integer", so it might be less clear that the issue is that number-of-names isn't valid.

You also get canonicalization if you treat fields as numbers. For example, if your data started with 005, the infoset would contain only "5" with the leading zero's stripped off. You may or may not want this, depending on if you a canonicalize infoset.

So there are definitely differences. Though the only case I can think of where it *really* is necessary is when numbers can have commas/scientific notation.


On 10/19/22 9:59 AM, Roger L Costello wrote:
Hi Folks,

Consider this data format:

5
Name: Tom
Name: Bill
Name: Jill
Name: Sara
Name: Bob

The first line (5) indicates the number of name lines that follow.

The first line can be specified using the integer datatype. See below.

The first line can also be specified using the string datatype. See below.

That is, both versions -- the integer version and the string version -- parse 
inputs identically.

My hypothesis is that every text data format that can be specified using 
specific datatypes (integer, date, float, etc.) can be specified using just the 
string datatype. Can you provide a counterexample to my hypothesis?  /Roger

DFDL Schema using the integer datatype:

<xs:element name="names">
     <xs:complexType>
         <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
             <xs:element name="number-of-names" type="xs:integer"
                        dfdl:textNumberCheckPolicy="strict"
                        dfdl:textStandardExponentRep="E"
                        dfdl:textNumberRounding="pattern"
                        dfdl:textStandardZeroRep="0"
                        dfdl:textStandardBase="10"
                        dfdl:textNumberRep="standard"
                        dfdl:textNumberPattern="#" />
             <xs:element name="entry" maxOccurs="unbounded"
                        dfdl:occursCount="{ ../number-of-names }" 
dfdl:occursCountKind="expression">
                 <xs:complexType>
                     <xs:sequence dfdl:separator=":" 
dfdl:separatorPosition="infix">
                         <xs:element name="name" type="xs:string" />
                         <xs:element name="value" type="xs:string" />
                     </xs:sequence>
                 </xs:complexType>
             </xs:element>
         </xs:sequence>
     </xs:complexType>
</xs:element>

DFDL Schema using the string datatype:

<xs:element name="names">
     <xs:complexType>
         <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
             <xs:element name="number-of-names"
                        dfdl:lengthKind="pattern"
                        dfdl:lengthPattern=".*">
                 <xs:simpleType>
                     <xs:restriction base="xs:string">
                         <xs:pattern value="[0-9]+"/>
                     </xs:restriction>
                 </xs:simpleType>
             </xs:element>
             <xs:element name="entry" maxOccurs="unbounded"
                        dfdl:occursCount="{ xs:integer(../number-of-names) }" 
dfdl:occursCountKind="expression">
                 <xs:complexType>
                     <xs:sequence dfdl:separator=":" 
dfdl:separatorPosition="infix">
                         <xs:element name="name" type="xs:string" />
                         <xs:element name="value" type="xs:string" />
                     </xs:sequence>
                 </xs:complexType>
             </xs:element>
         </xs:sequence>
     </xs:complexType>
</xs:element>

Reply via email to