Hi Folks,
My input contains a social security number (SSN), e.g.,
123-45-6789
If I declare the SSN element like this:
<xs:element name="SSN"
dfdl:lengthKind="explicit"
dfdl:length="11">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
then the parser will accept well-formed but invalid data such as this:
xxx-45-6789
If I want to be notified that the data is not valid, then I can use the -V
limited option. Then the parser will both generate XML and notify me that the
input is not valid.
If I add checkConstraints:
<xs:element name="SSN"
dfdl:lengthKind="explicit"
dfdl:length="11">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
</xs:appinfo>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
then the parser no longer accepts well-formed but invalid data. No XML is
generated.
Lesson Learned: Don't use checkConstraints if you want parsing to accept
well-formed but invalid input.
But, but, but, ........
Things aren't that simple.
Suppose SSN is part of a choice. The choice has two branches. The first branch
specifies RealID space SSN, the second branch specifies SSN space RealID.
Consider this valid input:
123-45-6789 A12345678
If the DFDL does not use checkConstraints, then this incorrect XML is generated:
<PersonID>
<RealID>123-45-6789</RealID>
<Space> </Space>
<SSN>A12345678</SSN>
</PersonID>
Notice that the <RealID> value is the ssn and the <SSN> value is the real id.
If we want to get correct XML, then we must use checkConstraints.
Lesson Learned: Use checkConstraints if you want parsing to generate correct
XML.
Overall Lesson Learned: You can't have a DFDL schema that both accepts
well-formed but invalid data and always produces correct XML.
Do you agree?
/Roger