This is a good example of where dfdl:checkConstraints(.) is properly used.


We have lots of experience of dfdl:checkConstraints being a bad idea because it 
makes the parser fail on well-formed but invalid data, which is often 
undesirable. Because you get no infoset at all to even discuss the validity 
thereof. Parsing really should stop at "well formed" data.


But positive examples where dfdl:checkConstraints is properly used have been 
rare. This is a good example of that.

________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Friday, May 10, 2019 11:55:46 AM
To: users@daffodil.apache.org
Subject: Re: Why am I getting this error message: Failed to parse infix 
separator. Cause: Parse Error: Separator '%NL;' not found.

dfdl:occursCountKind="implicit" just says to parse somewhere between
minOcurs and maxOccurs elements. There's no concept of lookahead or
smarts about how many elements might appear after it. It literally just
keeps trying to parse B elements until either we reach maxOccurs of them
or one of them fails to parse. The assert was used to cause it to fail
to parse when it reached something that didn't look like a B.

And yeah, my schema is just plain wrong. Assert pattern matches the data
stream, but my intention was to match the parsed value. The assert
pattern could probably be changed, but I think it's a bit more clear to
put a pattern restriction on the B element and change the assert to call
checkConstraints. So something like this:

  <xs:element name="B" maxOccurs="50" dfdl:occursCountKind="implicit">
    <xs:simpleType>
      <xs:annotation>
        <xs:appinfo source="http://www.ogf.org/dfdl/";>
          <dfdl:assert test="{ dfdl:checkConstraints(.) }" />
        </xs:appinfo>
      </xs:annotation>
      <xs:restriction base="xs:string">
        <xs:pattern value=".*[^0-9].*" />
      </xs:restriction>
    </xs:simpleType>
  </xs:element>

So each B is parsed, then we assert that the parsed value validates
according to the pattern value. When a value doesn't validate, that's
how we know we have reached the C elements.

- Steve

On 5/10/19 11:36 AM, Costello, Roger L. wrote:
> Hi Steve,
>
> I guess that I don't understand dfdl:occursCountKind="implicit". I thought it 
> means: "Hey Daffodil, figure out the appropriate occurrences of B elements by 
> inferring from the occurrence needs of its following elements." In this case, 
> C's are the following elements and the number of occurrences of C is equal to 
> the value of the first A element. That is, the occurrence needs for C is 
> easily determined, so the occurrence needs of B should be easily inferred. 
> That is, it seems to me that Daffodil should be able to recognize that these 
> values:
>
> 100
> 200
> 300
> 400
> 500
> 600
>
> are for the C element and the declaration for the B element should not need 
> an assert to specify, "Give me only strings up till the point where digits 
> are encountered." By adding dfdl:assert to the schema it is effectively 
> neutering the dfdl:occursCountKind="implicit". I am confused.
>
> Second question: I modified the schema as you suggested. See below. However, 
> I now get this error message:
>
> [error] Parse Error: Failed to populate C[2]. Missing infix separator. Cause: 
> Parse Error: Separator '%NL;' not found.
>
> <xs:element name="input">
>     <xs:complexType>
>         <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
>             <xs:element name="A" type="xs:integer" minOccurs="3"
>                maxOccurs="3" dfdl:occursCountKind="fixed" />
>             <xs:element name="B" type="xs:string" maxOccurs="50"
>                dfdl:occursCountKind="implicit">
>                 <xs:annotation>
>                     <xs:appinfo source="http://www.ogf.org/dfdl/";>
>                         <dfdl:assert testKind="pattern" 
> testPattern=".*[^0-9].*" />
>                     </xs:appinfo>
>                 </xs:annotation>
>             </xs:element>
>             <xs:element name="C" type="xs:integer" maxOccurs="unbounded"
>                dfdl:occursCountKind="expression"
>                dfdl:occursCount="{ ../A[1] }" />
>         </xs:sequence>
>     </xs:complexType>
> </xs:element>
>
> -----Original Message-----
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Friday, May 10, 2019 9:08 AM
> To: users@daffodil.apache.org
> Subject: [EXT] Re: Why am I getting this error message: Failed to parse infix 
> separator. Cause: Parse Error: Separator '%NL;' not found.
>
> The issue is that element B can be 50 or fewer strings. And although 100, 
> 200, etc. look like numbers, they are also completely valid strings. So 
> Daffodil will just keep consuming every line after the first three numbers as 
> B elements. Daffodil still expects a separator followed by some C's, but we 
> hit the end of the data and error out saying we were looking for that 
> separator.
>
> So we need to somehow tell Daffodil to stop looking for B's. One solution 
> here is to add an assertion to test that each B element does not look like a 
> not a number. The DFDL expression language doesn't have a good way to test if 
> a string is a number or not, but a regex pattern test could work:
>
>   <xs:element name="B" type="xs:string" maxOccurs="50"
>     dfdl:occursCountKind="implicit">
>     <xs:annotation>
>       <xs:appinfo source="http://www.ogf.org/dfdl/";>
>         <dfdl:assert testKind="pattern" testPattern=".*[^0-9].*" />
>       </xs:appinfo>
>     </xs:annotation>
>   </xs:element>
>
> This regular expression says that all B element must contains at least one 
> character that is not a numeric digit. So when Daffodil gets to "100", the 
> assertion will fail since it is all numbers, and we'll stop parsing B's and 
> start looking for C's.
>
> - Steve
>
>
> On 5/10/19 8:00 AM, Costello, Roger L. wrote:
>> Hello DFDL community,
>>
>> My input file consists of exactly 3 integers, each on a new line,
>> followed by an arbitrary number of strings, again, each on a new line,
>> followed by a number of integers, the number being determined by the first 
>> integer in the file. For example:
>>
>> 6
>> 1
>> 2
>> Banana
>> Orange
>> Apple
>> Grape
>> 100
>> 200
>> 300
>> 400
>> 500
>> 600
>>
>> Below is my DFDL schema. It generates this error:
>>
>> *[error] Parse Error: Failed to parse infix separator. Cause: Parse Error:
>> Separator '%NL;' not found.*
>>
>> Why is that error is being generated? How to fix the DFDL schema?
>> /Roger
>>
>> <xs:elementname="input">
>> <xs:complexType>
>> <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix">
>> <xs:elementname="A"type="xs:integer"
>>                          minOccurs="3"maxOccurs="3"
>>                          dfdl:occursCountKind="fixed"/>
>> <xs:elementname="B"type="xs:string"maxOccurs="50"
>>                          dfdl:occursCountKind="implicit"/>
>> <xs:elementname="C"type="xs:integer"maxOccurs="unbounded"
>>                         dfdl:occursCountKind="expression"
>>                          dfdl:occursCount="{ ../A[1] }"/>
>> </xs:sequence> </xs:complexType> </xs:element>
>>
>

Reply via email to