Hi Steve,

Aren't these statements at odds with each other:

> ... the number of occurrences is to be established
> using speculative parsing

versus

> There's no concept of lookahead or smarts about
> how many values might appear after the element.

I honestly don't know what "speculative parsing" is, but it sounds like it 
would involve lookahead and smarts about how many values might appear after the 
element. No?

/Roger

-----Original Message-----
From: Steve Lawrence <[email protected]> 
Sent: Friday, May 10, 2019 1:57 PM
To: [email protected]
Subject: [EXT] Re: Why am I getting this error message: Failed to parse infix 
separator. Cause: Parse Error: Separator '%NL;' not found.

I think that's right, but might be a bit of an oversimplification. For example, 
it doesn't talk about defaults, only sort of implies the concept of speculative 
parsing, doesn't talk about what happens if M occurrences aren't found, etc. A 
more exact, but maybe a bit more complex description is section 16.1.2 in the 
spec [1]:

> The enum 'implicit' should be used when the number of occurrences is to be 
> established using speculative parsing, and there are lower and upper bounds 
> to control the speculation. The bounds are provided by the XSDL minOccurs and 
> XSDL maxOccurs properties.
> 
> When parsing, up to maxOccurs occurrences are expected in the data. It is a 
> processing error if less than minOccurs occurrences are found or defaulted. 
> The parser stops looking for occurrences when either minOccurs have been 
> found or defaulted and speculative parsing does not find another occurrence, 
> or maxOccurs have been found or defaulted.
> 
> When unparsing, up to maxOccurs occurrences are expected in the infoset. It 
> is a processing error if less than minOccurs occurrences are found or 
> defaulted, or if more than maxOccurs occurrences are found.

[1] https://daffodil.apache.org/docs/dfdl/#_Toc398030791



On 5/10/19 1:48 PM, Costello, Roger L. wrote:
> Excellent! Thank you Steve.
> 
> Is the following an accurate description of dfdl:occursCountKind='implicit'?
> 
> Suppose an element declaration has dfdl:occursCountKind='implicit' 
> with minOccurs="M" and maxOccurs="N"... This instructs Daffodil to 
> consume between M and N values. There's no concept of lookahead or 
> smarts about how many values might appear after the element. Daffodil 
> just keeps consuming values until either it consumes N values or one 
> of the values fails to parse (i.e., the value fails to meet the element's 
> requirements).
> 
> -----Original Message-----
> From: Steve Lawrence <[email protected]>
> Sent: Friday, May 10, 2019 11:56 AM
> To: [email protected]
> Subject: [EXT] Re: Why am I getting this error message: Failed to 
> parse infix separator. Cause: Parse Error: Separator '%NL;' not found.
> 
> dfdl:occursCountKind="implicit" just says to parse somewhere between 
> minOcurs and maxOccurs elements. There's no concept of lookahead or 
> smarts about how many elements might appear after it. It literally 
> just keeps trying to parse B elements until either we reach maxOccurs of them 
> or one of them fails to parse.
> The assert was used to cause it to fail to parse when it reached 
> something that didn't look like a B.
> 
> And yeah, my schema is just plain wrong. Assert pattern matches the 
> data stream, but my intention was to match the parsed value. The 
> assert pattern could probably be changed, but I think it's a bit more 
> clear to put a pattern restriction on the B element and change the 
> assert to call checkConstraints. So something like this:
> 
>    <xs:element name="B" maxOccurs="50" 
> dfdl:occursCountKind="implicit">
> 
>      <xs:simpleType>
> 
>        <xs:annotation>
> 
>          <xs:appinfo source="http://www.ogf.org/dfdl/";>
> 
>            <dfdl:assert test="{ dfdl:checkConstraints(.) }" />
> 
>          </xs:appinfo>
> 
>        </xs:annotation>
> 
>        <xs:restriction base="xs:string">
> 
>          <xs:pattern value=".*[^0-9].*" />
> 
>        </xs:restriction>
> 
>      </xs:simpleType>
> 
>    </xs:element>
> 
> So each B is parsed, then we assert that the parsed value validates 
> according to the pattern value. When a value doesn't validate, that's 
> how we know we have reached the C elements.
> 
> - Steve
> 
> On 5/10/19 11:36 AM, Costello, Roger L. wrote:
> 
>  > Hi Steve,
> 
>  >
> 
>  > I guess that I don't understand dfdl:occursCountKind="implicit". I 
> thought it
> means: "Hey Daffodil, figure out the appropriate occurrences of B 
> elements by inferring from the occurrence needs of its following 
> elements." In this case, C's are the following elements and the number 
> of occurrences of C is equal to the value of the first A element. That 
> is, the occurrence needs for C is easily determined, so the occurrence 
> needs of B should be easily inferred. That is, it seems to me that Daffodil 
> should be able to recognize that these values:
> 
>  >
> 
>  > 100
> 
>  > 200
> 
>  > 300
> 
>  > 400
> 
>  > 500
> 
>  > 600
> 
>  >
> 
>  > are for the C element and the declaration for the B element should 
> not need an assert to specify, "Give me only strings up till the point 
> where digits are encountered." By adding dfdl:assert to the schema it 
> is effectively neutering the dfdl:occursCountKind="implicit". I am confused.
> 
>  >
> 
>  > Second question: I modified the schema as you suggested. See below. 
> However, I now get this error message:
> 
>  >
> 
>  > [error] Parse Error: Failed to populate C[2]. Missing infix separator. 
> Cause: 
> Parse Error: Separator '%NL;' not found.
> 
>  >
> 
>  > <xs:element name="input">
> 
>  >     <xs:complexType>
> 
>  >         <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
> 
>  >             <xs:element name="A" type="xs:integer" minOccurs="3"
> 
>  >                             maxOccurs="3" dfdl:occursCountKind="fixed" />
> 
>  >             <xs:element name="B" type="xs:string" maxOccurs="50"
> 
>  >                             dfdl:occursCountKind="implicit">
> 
>  >                 <xs:annotation>
> 
>  >                     <xs:appinfo source="http://www.ogf.org/dfdl/";>
> 
>  >                         <dfdl:assert testKind="pattern" 
> testPattern=".*[^0-9].*" />
> 
>  >                     </xs:appinfo>
> 
>  >                 </xs:annotation>
> 
>  >             </xs:element>
> 
>  >             <xs:element name="C" type="xs:integer" maxOccurs="unbounded"
> 
>  >                             dfdl:occursCountKind="expression"
> 
>  >                             dfdl:occursCount="{ ../A[1] }" />
> 
>  >         </xs:sequence>
> 
>  >     </xs:complexType>
> 
>  > </xs:element>
> 
>  >
> 
>  > -----Original Message-----
> 
>  > From: Steve Lawrence <[email protected] 
> <mailto:[email protected]>>
> 
>  > Sent: Friday, May 10, 2019 9:08 AM
> 
>  > To: [email protected] <mailto:[email protected]>
> 
>  > Subject: [EXT] Re: Why am I getting this error message: Failed to 
> parse infix separator. Cause: Parse Error: Separator '%NL;' not found.
> 
>  >
> 
>  > The issue is that element B can be 50 or fewer strings. And 
> although 100, 200, etc. look like numbers, they are also completely 
> valid strings. So Daffodil will just keep consuming every line after the 
> first three numbers as B elements.
> Daffodil still expects a separator followed by some C's, but we hit 
> the end of the data and error out saying we were looking for that separator.
> 
>  >
> 
>  > So we need to somehow tell Daffodil to stop looking for B's. One 
> solution here is to add an assertion to test that each B element does 
> not look like a not a number. The DFDL expression language doesn't 
> have a good way to test if a string is a number or not, but a regex pattern 
> test could work:
> 
>  >
> 
>  >   <xs:element name="B" type="xs:string" maxOccurs="50"
> 
>  >     dfdl:occursCountKind="implicit">
> 
>  >     <xs:annotation>
> 
>  >       <xs:appinfo source="http://www.ogf.org/dfdl/";>
> 
>  >         <dfdl:assert testKind="pattern" testPattern=".*[^0-9].*" />
> 
>  >       </xs:appinfo>
> 
>  >     </xs:annotation>
> 
>  >   </xs:element>
> 
>  >
> 
>  > This regular expression says that all B element must contains at 
> least one character that is not a numeric digit. So when Daffodil gets 
> to "100", the assertion will fail since it is all numbers, and we'll 
> stop parsing B's and start looking for C's.
> 
>  >
> 
>  > - Steve
> 
>  >
> 
>  >
> 
>  > On 5/10/19 8:00 AM, Costello, Roger L. wrote:
> 
>  >> Hello DFDL community,
> 
>  >>
> 
>  >> My input file consists of exactly 3 integers, each on a new line,
> 
>  >> followed by an arbitrary number of strings, again, each on a new
> 
>  >> line, followed by a number of integers, the number being 
> determined by the first integer in the file. For example:
> 
>  >>
> 
>  >> 6
> 
>  >> 1
> 
>  >> 2
> 
>  >> Banana
> 
>  >> Orange
> 
>  >> Apple
> 
>  >> Grape
> 
>  >> 100
> 
>  >> 200
> 
>  >> 300
> 
>  >> 400
> 
>  >> 500
> 
>  >> 600
> 
>  >>
> 
>  >> Below is my DFDL schema. It generates this error:
> 
>  >>
> 
>  >> *[error] Parse Error: Failed to parse infix separator. Cause: Parse Error:
> 
>  >> Separator '%NL;' not found.*
> 
>  >>
> 
>  >> Why is that error is being generated? How to fix the DFDL schema?
> 
>  >> /Roger
> 
>  >>
> 
>  >> <xs:elementname="input">
> 
>  >> <xs:complexType>
> 
>  >> <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix">
> 
>  >> <xs:elementname="A"type="xs:integer"
> 
>  >>                          minOccurs="3"maxOccurs="3"
> 
>  >>                          dfdl:occursCountKind="fixed"/>
> 
>  >> <xs:elementname="B"type="xs:string"maxOccurs="50"
> 
>  >>                          dfdl:occursCountKind="implicit"/>
> 
>  >> <xs:elementname="C"type="xs:integer"maxOccurs="unbounded"
> 
>  >>                         dfdl:occursCountKind="expression"
> 
>  >>                          dfdl:occursCount="{ ../A[1] }"/>
> 
>  >> </xs:sequence> </xs:complexType> </xs:element>
> 
>  >>
> 
>  >
> 

Reply via email to