Re: Can every text data format be specified using strings?

Steve Lawrence Wed, 19 Oct 2022 07:29:25 -0700

One case where the xs:string approach doesn't work is if your integer isallowed to contain commas or scientific notation. The xs:integer() andother XPath functions don't know how to handle these and will fail toconvert them.

I think another case where you might want to use xs:int instead ofxs:string is to control where a processing error occurs. Using a numerictype will cause a parse error when that field is parsed, whereas using astring will only cause an error when it is converted in an expression,which may never happen. So these would only be validation errors--theremight be cases were you need a processing error to force backtracking.However, you could add an assert with dfdl:checkConstraints() to causebacktracking, but at that point you might as well just use xs:int and tokeep the schema from becoming too verbose.

Also, diagnostics will be different when the field isn't valid, andmaybe less helpful using the string approach. For example, with an intyou would get an error like "number-of-names names was not a validinteger". But with the string approach you would get an error like"expression could not be converted to an integer", so it might be lessclear that the issue is that number-of-names isn't valid.

You also get canonicalization if you treat fields as numbers. Forexample, if your data started with 005, the infoset would contain only"5" with the leading zero's stripped off. You may or may not want this,depending on if you a canonicalize infoset.

So there are definitely differences. Though the only case I can think ofwhere it *really* is necessary is when numbers can havecommas/scientific notation.



On 10/19/22 9:59 AM, Roger L Costello wrote:

Hi Folks,

Consider this data format:

5
Name: Tom
Name: Bill
Name: Jill
Name: Sara
Name: Bob

The first line (5) indicates the number of name lines that follow.

The first line can be specified using the integer datatype. See below.

The first line can also be specified using the string datatype. See below.

That is, both versions -- the integer version and the string version -- parse 
inputs identically.

My hypothesis is that every text data format that can be specified using 
specific datatypes (integer, date, float, etc.) can be specified using just the 
string datatype. Can you provide a counterexample to my hypothesis?  /Roger

DFDL Schema using the integer datatype:

<xs:element name="names">
     <xs:complexType>
         <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
             <xs:element name="number-of-names" type="xs:integer"
                        dfdl:textNumberCheckPolicy="strict"
                        dfdl:textStandardExponentRep="E"
                        dfdl:textNumberRounding="pattern"
                        dfdl:textStandardZeroRep="0"
                        dfdl:textStandardBase="10"
                        dfdl:textNumberRep="standard"
                        dfdl:textNumberPattern="#" />
             <xs:element name="entry" maxOccurs="unbounded"
                        dfdl:occursCount="{ ../number-of-names }" 
dfdl:occursCountKind="expression">
                 <xs:complexType>
                     <xs:sequence dfdl:separator=":" 
dfdl:separatorPosition="infix">
                         <xs:element name="name" type="xs:string" />
                         <xs:element name="value" type="xs:string" />
                     </xs:sequence>
                 </xs:complexType>
             </xs:element>
         </xs:sequence>
     </xs:complexType>
</xs:element>

DFDL Schema using the string datatype:

<xs:element name="names">
     <xs:complexType>
         <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
             <xs:element name="number-of-names"
                        dfdl:lengthKind="pattern"
                        dfdl:lengthPattern=".*">
                 <xs:simpleType>
                     <xs:restriction base="xs:string">
                         <xs:pattern value="[0-9]+"/>
                     </xs:restriction>
                 </xs:simpleType>
             </xs:element>
             <xs:element name="entry" maxOccurs="unbounded"
                        dfdl:occursCount="{ xs:integer(../number-of-names) }" 
dfdl:occursCountKind="expression">
                 <xs:complexType>
                     <xs:sequence dfdl:separator=":" 
dfdl:separatorPosition="infix">
                         <xs:element name="name" type="xs:string" />
                         <xs:element name="value" type="xs:string" />
                     </xs:sequence>
                 </xs:complexType>
             </xs:element>
         </xs:sequence>
     </xs:complexType>
</xs:element>

Re: Can every text data format be specified using strings?

Reply via email to