One case where the xs:string approach doesn't work is if your integer is
allowed to contain commas or scientific notation. The xs:integer() and
other XPath functions don't know how to handle these and will fail to
convert them.
I think another case where you might want to use xs:int instead of
xs:string is to control where a processing error occurs. Using a numeric
type will cause a parse error when that field is parsed, whereas using a
string will only cause an error when it is converted in an expression,
which may never happen. So these would only be validation errors--there
might be cases were you need a processing error to force backtracking.
However, you could add an assert with dfdl:checkConstraints() to cause
backtracking, but at that point you might as well just use xs:int and to
keep the schema from becoming too verbose.
Also, diagnostics will be different when the field isn't valid, and
maybe less helpful using the string approach. For example, with an int
you would get an error like "number-of-names names was not a valid
integer". But with the string approach you would get an error like
"expression could not be converted to an integer", so it might be less
clear that the issue is that number-of-names isn't valid.
You also get canonicalization if you treat fields as numbers. For
example, if your data started with 005, the infoset would contain only
"5" with the leading zero's stripped off. You may or may not want this,
depending on if you a canonicalize infoset.
So there are definitely differences. Though the only case I can think of
where it *really* is necessary is when numbers can have
commas/scientific notation.
On 10/19/22 9:59 AM, Roger L Costello wrote:
Hi Folks,
Consider this data format:
5
Name: Tom
Name: Bill
Name: Jill
Name: Sara
Name: Bob
The first line (5) indicates the number of name lines that follow.
The first line can be specified using the integer datatype. See below.
The first line can also be specified using the string datatype. See below.
That is, both versions -- the integer version and the string version -- parse
inputs identically.
My hypothesis is that every text data format that can be specified using
specific datatypes (integer, date, float, etc.) can be specified using just the
string datatype. Can you provide a counterexample to my hypothesis? /Roger
DFDL Schema using the integer datatype:
<xs:element name="names">
<xs:complexType>
<xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
<xs:element name="number-of-names" type="xs:integer"
dfdl:textNumberCheckPolicy="strict"
dfdl:textStandardExponentRep="E"
dfdl:textNumberRounding="pattern"
dfdl:textStandardZeroRep="0"
dfdl:textStandardBase="10"
dfdl:textNumberRep="standard"
dfdl:textNumberPattern="#" />
<xs:element name="entry" maxOccurs="unbounded"
dfdl:occursCount="{ ../number-of-names }"
dfdl:occursCountKind="expression">
<xs:complexType>
<xs:sequence dfdl:separator=":"
dfdl:separatorPosition="infix">
<xs:element name="name" type="xs:string" />
<xs:element name="value" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
DFDL Schema using the string datatype:
<xs:element name="names">
<xs:complexType>
<xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
<xs:element name="number-of-names"
dfdl:lengthKind="pattern"
dfdl:lengthPattern=".*">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]+"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="entry" maxOccurs="unbounded"
dfdl:occursCount="{ xs:integer(../number-of-names) }"
dfdl:occursCountKind="expression">
<xs:complexType>
<xs:sequence dfdl:separator=":"
dfdl:separatorPosition="infix">
<xs:element name="name" type="xs:string" />
<xs:element name="value" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>