Thanks Mike and Steve!

I need to study carefully what Mike said. 

Steve, I think your suggestion is not correct. I did as you suggested and 
reversed the order of the branches in the choice. Now, with this input:

John Doe/2006N-05912E/Sally Smith

I get this XML:

<Test>
  <A>John Doe</A>
  <Origin_>2006N-05912E</Origin_>
  <B>Sally Smith</B>
</Test>

which is not correct.

I conclude that switching the order of the branches in the choice is not 
correct. Do you concur?

/Roger

-----Original Message-----
From: Steve Lawrence <[email protected]> 
Sent: Thursday, August 25, 2022 7:38 AM
To: [email protected]
Subject: [EXT] Re: Daffodil does not correctly parse variable length, nillable 
elements with complexType

Another option, put your nillable type as the first branch in the 
choice. This way Daffodil will attempt to parse the nillable type first, 
and will only attempt to parse the complex Origin.

You'll still likely want the validation that Mike suggestions so that 
when something fails it fails immediately instead of happily continuing 
off the rails.


On 8/25/22 7:30 AM, Mike Beckerle wrote:
> I think I know what is happening.
> 
> In the battle of delimiters vs. nested explicit length, explicit wins.
> 
> So if you have abc/-/cef
> 
> but after parsing abc then finding the separator /, the next field is
> latitudeDegrees with explicit length 2, that "wins" and "-/" are the 
> characters
> of that string.
> 
> Validation will then issue a validation warning because Daffodil's "limited"
> validation is done as the elements are parsed.
> 
> This does not cause backtracking, it's just a "warning" that the seemingly
> well-formed data is invalid.
> 
> Then latitudeMinutes is parsed, and that uses the ever problematic lengthKind
> pattern, which succeeds, with a zero-length string, which then also causes a
> validation error.
> 
>    Again because this validation error because this, now zero-length string
> doesn't look like the digits you expect.
> 
> Then it parses the hyphen element, which is just a string of length 1,
> 
> .... I'll stop here because things are clearly off the rails.
> 
> Here's my suggestion for how to fix this and get Daffodil to magically do what
> you want, which is to pay attention to the facets.
> 
> <!-- vString = 'validated string'. Facets are checked while parsing. -->
> <simpleType name="vString">
>      <annotation><appinfo source="http://www.ogf.org/dfdl/
> <http://www.ogf.org/dfdl/>">
>          <dfdl:assert message="Invalid value">{ dfdl:checkConstraints(.)
> }</dfdl:assert>
>      </appinfo></annotation>
>       <restriction base="xs:string"/>
> </simpleType>
> 
> Define all your strings with vString as your type, and it should behave much
> more like you expect.
> 
> Now normally I tell people not to call checkConstraints(.) on everything 
> because
> it fails to distinguish well-formed data from invalid data, and often one 
> wants
> the parse to succeed even if the data is invalid.
> 
> In your case things are different. You have not provided enough information in
> the DFDL properties to parse this data. The facets are necessary information 
> to
> successfully parse it.
> 
> You will want to complement vString with use of discriminators. For example I
> think your schema should have a discriminator after the latitudeDegrees 
> element
> because if you successfully parse that element, backtracking to the nilled 
> case
> no longer makes sense.
> 
> 
> 
> 
> On Thu, Aug 25, 2022 at 7:01 AM Roger L Costello <[email protected]
> <mailto:[email protected]>> wrote:
> 
>      Hi Folks,
> 
>      Here are two sample inputs:
> 
>      John Doe/2006N-05912E/Sally Smith
>      John Doe/-/Sally Smith
> 
>      It is the field in the middle that is of interest.
> 
>      The field is a composite field, i.e., it consists of a series of parts: 
> lat
>      degrees, lat minutes, lat hemisphere, hyphen, long degrees, long minutes,
>      long hemisphere. No separator between the parts.
> 
>      The field is nillable and the hyphen is the nil value.
> 
>      The first input shown above succeeds, the second fails to parse.
> 
>      What we have here is a variable length, nillable element with a 
> complexType
>      and the nil value is not %ES;. As we have determined in previous posts,
>      Daffodil does not support this. So, the workaround is to place the 
> element
>      in a choice, where the first branch of the choice is the element minus 
> the
>      nillable stuff and the second branch is a plain string element that is
>      nillable. Well, I implemented that and Daffodil complains:
> 
>      [error] Parse Error: Failed to parse infix separator. Cause: Parse Error:
>      Separator '/' not found
> 
>      When I use the -V limited parse option I get a completely different set 
> of
>      error messages, e.g.:
> 
>      [error] Validation Error: LatitudeMinutes failed facet checks due to: 
> facet
>      pattern(s):
>      
> [0-9]{2}|[0-9]{2}\.[0-9]{1}|[0-9]{2}\.[0-9]{2}|[0-9]{2}\.[0-9]{3}|[0-9]{2}\.[0-9]{4}
> 
>      Am I doing something wrong in my DFDL schema (shown below) or is this a 
> bug
>      in Daffodil?  /Roger
> 
>      <?xml version="1.0" encoding="UTF-8"?>
>      <xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/
>      <http://www.ogf.org/dfdl/dfdl-1.0/>"
>                            xmlns:xs="http://www.w3.org/2001/XMLSchema
>      <http://www.w3.org/2001/XMLSchema>">
>           <xs:annotation>
>               <xs:appinfo source="http://www.ogf.org/dfdl/
>      <http://www.ogf.org/dfdl/>">
>                   <dfdl:format
>                       alignment="1"
>                       alignmentUnits="bytes"
>                       emptyValueDelimiterPolicy="none"
>                       encoding="ASCII"
>                       encodingErrorPolicy="replace"
>                       escapeSchemeRef=""
>                       fillByte="%SP;"
>                       floating="no"
>                       ignoreCase="yes"
>                       initiatedContent="no"
>                       initiator=""
>                       leadingSkip="0"
>                       lengthKind="delimited"
>                       lengthUnits="characters"
>                       nilValueDelimiterPolicy="none"
>                       occursCountKind="implicit"
>                       outputNewLine="%CR;%LF;"
>                       representation="text"
>                       separator=""
>                       separatorSuppressionPolicy="anyEmpty"
>                       sequenceKind="ordered"
>                       textBidi="no"
>                       textPadKind="none"
>                       textTrimKind="none"
>                       trailingSkip="0"
>                       truncateSpecifiedLengthString="no"
>                       terminator=""
>                       textNumberRep="standard"
>                       textStandardBase="10"
>                       textStandardZeroRep="0"
>                       textNumberRounding="pattern"
>                       textStandardExponentRep="E"
>                       textNumberCheckPolicy="strict"/>
>               </xs:appinfo>
>           </xs:annotation>
>           <xs:element name="Test">
>               <xs:complexType>
>                   <xs:sequence dfdl:separator="/" 
> dfdl:separatorPosition="infix">
>                       <xs:element name="A" type="xs:string"/>
>                       <xs:choice dfdl:choiceLengthKind="implicit">
>                           <xs:element name="Origin">
>                               <xs:complexType>
>                                   <xs:sequence dfdl:separator="">
>                                       <xs:element name="LatitudeDegrees"
>      dfdl:lengthKind="explicit" dfdl:length="2">
>                                           <xs:simpleType>
>                                               <xs:restriction 
> base="xs:string">
>                                                   <xs:pattern 
> value="[0-9]{2}"/>
>                                               </xs:restriction>
>                                           </xs:simpleType>
>                                       </xs:element>
>                                       <xs:element name="LatitudeMinutes"
>      dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(N|S))">
>                                           <xs:simpleType>
>                                               <xs:restriction 
> base="xs:string">
>                                                   <xs:pattern 
> value="[0-9]{2}"/>
>                                                   <xs:pattern
>      value="[0-9]{2}\.[0-9]{1}"/>
>                                                   <xs:pattern
>      value="[0-9]{2}\.[0-9]{2}"/>
>                                                   <xs:pattern
>      value="[0-9]{2}\.[0-9]{3}"/>
>                                                   <xs:pattern
>      value="[0-9]{2}\.[0-9]{4}"/>
>                                               </xs:restriction>
>                                           </xs:simpleType>
>                                       </xs:element>
>                                       <xs:element name="LatitudeHemisphere"
>      dfdl:lengthKind="explicit" dfdl:length="1">
>                                           <xs:simpleType>
>                                               <xs:restriction 
> base="xs:string">
>                                                   <xs:enumeration value="N"/>
>                                                   <xs:enumeration value="S"/>
>                                               </xs:restriction>
>                                           </xs:simpleType>
>                                       </xs:element>
>                                       <xs:element name="Hyphen"
>      dfdl:lengthKind="explicit" dfdl:length="1">
>                                           <xs:simpleType>
>                                               <xs:restriction 
> base="xs:string">
>                                                   <xs:enumeration value="-"/>
>                                               </xs:restriction>
>                                           </xs:simpleType>
>                                       </xs:element>
>                                       <xs:element name="LongitudeDegrees"
>      dfdl:lengthKind="explicit" dfdl:length="3">
>                                           <xs:simpleType>
>                                               <xs:restriction 
> base="xs:string">
>                                                   <xs:pattern 
> value="[0-9]{3}"/>
>                                               </xs:restriction>
>                                           </xs:simpleType>
>                                       </xs:element>
>                                       <xs:element name="LongitudeMinutes"
>      dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(E|W))">
>                                           <xs:simpleType>
>                                               <xs:restriction 
> base="xs:string">
>                                                   <xs:pattern 
> value="[0-9]{2}"/>
>                                                   <xs:pattern
>      value="[0-9]{2}\.[0-9]{1}"/>
>                                                   <xs:pattern
>      value="[0-9]{2}\.[0-9]{2}"/>
>                                                   <xs:pattern
>      value="[0-9]{2}\.[0-9]{3}"/>
>                                                   <xs:pattern
>      value="[0-9]{2}\.[0-9]{4}"/>
>                                               </xs:restriction>
>                                           </xs:simpleType>
>                                       </xs:element>
>                                       <xs:element name="LongitudeHemisphere">
>                                           <xs:simpleType>
>                                               <xs:restriction 
> base="xs:string">
>                                                   <xs:enumeration value="E"/>
>                                                   <xs:enumeration value="W"/>
>                                               </xs:restriction>
>                                           </xs:simpleType>
>                                       </xs:element>
>                                   </xs:sequence>
>                               </xs:complexType>
>                           </xs:element>
>                           <xs:element name="Origin_" type="xs:string"
>      nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="-"/>
>                       </xs:choice>
>                       <xs:element name="B" type="xs:string"/>
>                   </xs:sequence>
>               </xs:complexType>
>           </xs:element>
>      </xs:schema>
> 

Reply via email to