I think that's right, but might be a bit of an oversimplification. For
example, it doesn't talk about defaults, only sort of implies the
concept of speculative parsing, doesn't talk about what happens if M
occurrences aren't found, etc. A more exact, but maybe a bit more
complex description is section 16.1.2 in the spec [1]:

> The enum 'implicit' should be used when the number of occurrences is to be 
> established using speculative parsing, and there are lower and upper bounds 
> to control the speculation. The bounds are provided by the XSDL minOccurs and 
> XSDL maxOccurs properties.
> 
> When parsing, up to maxOccurs occurrences are expected in the data. It is a 
> processing error if less than minOccurs occurrences are found or defaulted. 
> The parser stops looking for occurrences when either minOccurs have been 
> found or defaulted and speculative parsing does not find another occurrence, 
> or maxOccurs have been found or defaulted.
> 
> When unparsing, up to maxOccurs occurrences are expected in the infoset. It 
> is a processing error if less than minOccurs occurrences are found or 
> defaulted, or if more than maxOccurs occurrences are found.

[1] https://daffodil.apache.org/docs/dfdl/#_Toc398030791



On 5/10/19 1:48 PM, Costello, Roger L. wrote:
> Excellent! Thank you Steve.
> 
> Is the following an accurate description of dfdl:occursCountKind='implicit'?
> 
> Suppose an element declaration has dfdl:occursCountKind='implicit' with 
> minOccurs="M" and maxOccurs="N"… This instructs Daffodil to consume between M 
> and N values. There's no concept of lookahead or smarts about how many values 
> might appear after the element. Daffodil just keeps consuming values until 
> either it consumes N values or one of the values fails to parse (i.e., the 
> value 
> fails to meet the element’s requirements).
> 
> -----Original Message-----
> From: Steve Lawrence <[email protected]>
> Sent: Friday, May 10, 2019 11:56 AM
> To: [email protected]
> Subject: [EXT] Re: Why am I getting this error message: Failed to parse infix 
> separator. Cause: Parse Error: Separator '%NL;' not found.
> 
> dfdl:occursCountKind="implicit" just says to parse somewhere between minOcurs 
> and maxOccurs elements. There's no concept of lookahead or smarts about how 
> many 
> elements might appear after it. It literally just keeps trying to parse B 
> elements until either we reach maxOccurs of them or one of them fails to 
> parse. 
> The assert was used to cause it to fail to parse when it reached something 
> that 
> didn't look like a B.
> 
> And yeah, my schema is just plain wrong. Assert pattern matches the data 
> stream, 
> but my intention was to match the parsed value. The assert pattern could 
> probably be changed, but I think it's a bit more clear to put a pattern 
> restriction on the B element and change the assert to call checkConstraints. 
> So 
> something like this:
> 
>    <xs:element name="B" maxOccurs="50" dfdl:occursCountKind="implicit">
> 
>      <xs:simpleType>
> 
>        <xs:annotation>
> 
>          <xs:appinfo source="http://www.ogf.org/dfdl/";>
> 
>            <dfdl:assert test="{ dfdl:checkConstraints(.) }" />
> 
>          </xs:appinfo>
> 
>        </xs:annotation>
> 
>        <xs:restriction base="xs:string">
> 
>          <xs:pattern value=".*[^0-9].*" />
> 
>        </xs:restriction>
> 
>      </xs:simpleType>
> 
>    </xs:element>
> 
> So each B is parsed, then we assert that the parsed value validates according 
> to 
> the pattern value. When a value doesn't validate, that's how we know we have 
> reached the C elements.
> 
> - Steve
> 
> On 5/10/19 11:36 AM, Costello, Roger L. wrote:
> 
>  > Hi Steve,
> 
>  >
> 
>  > I guess that I don't understand dfdl:occursCountKind="implicit". I thought 
> it 
> means: "Hey Daffodil, figure out the appropriate occurrences of B elements by 
> inferring from the occurrence needs of its following elements." In this case, 
> C's are the following elements and the number of occurrences of C is equal to 
> the value of the first A element. That is, the occurrence needs for C is 
> easily 
> determined, so the occurrence needs of B should be easily inferred. That is, 
> it 
> seems to me that Daffodil should be able to recognize that these values:
> 
>  >
> 
>  > 100
> 
>  > 200
> 
>  > 300
> 
>  > 400
> 
>  > 500
> 
>  > 600
> 
>  >
> 
>  > are for the C element and the declaration for the B element should not 
> need 
> an assert to specify, "Give me only strings up till the point where digits 
> are 
> encountered." By adding dfdl:assert to the schema it is effectively neutering 
> the dfdl:occursCountKind="implicit". I am confused.
> 
>  >
> 
>  > Second question: I modified the schema as you suggested. See below. 
> However, 
> I now get this error message:
> 
>  >
> 
>  > [error] Parse Error: Failed to populate C[2]. Missing infix separator. 
> Cause: 
> Parse Error: Separator '%NL;' not found.
> 
>  >
> 
>  > <xs:element name="input">
> 
>  >     <xs:complexType>
> 
>  >         <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
> 
>  >             <xs:element name="A" type="xs:integer" minOccurs="3"
> 
>  >                             maxOccurs="3" dfdl:occursCountKind="fixed" />
> 
>  >             <xs:element name="B" type="xs:string" maxOccurs="50"
> 
>  >                             dfdl:occursCountKind="implicit">
> 
>  >                 <xs:annotation>
> 
>  >                     <xs:appinfo source="http://www.ogf.org/dfdl/";>
> 
>  >                         <dfdl:assert testKind="pattern" 
> testPattern=".*[^0-9].*" />
> 
>  >                     </xs:appinfo>
> 
>  >                 </xs:annotation>
> 
>  >             </xs:element>
> 
>  >             <xs:element name="C" type="xs:integer" maxOccurs="unbounded"
> 
>  >                             dfdl:occursCountKind="expression"
> 
>  >                             dfdl:occursCount="{ ../A[1] }" />
> 
>  >         </xs:sequence>
> 
>  >     </xs:complexType>
> 
>  > </xs:element>
> 
>  >
> 
>  > -----Original Message-----
> 
>  > From: Steve Lawrence <[email protected] <mailto:[email protected]>>
> 
>  > Sent: Friday, May 10, 2019 9:08 AM
> 
>  > To: [email protected] <mailto:[email protected]>
> 
>  > Subject: [EXT] Re: Why am I getting this error message: Failed to parse 
> infix 
> separator. Cause: Parse Error: Separator '%NL;' not found.
> 
>  >
> 
>  > The issue is that element B can be 50 or fewer strings. And although 100, 
> 200, etc. look like numbers, they are also completely valid strings. So 
> Daffodil 
> will just keep consuming every line after the first three numbers as B 
> elements. 
> Daffodil still expects a separator followed by some C's, but we hit the end 
> of 
> the data and error out saying we were looking for that separator.
> 
>  >
> 
>  > So we need to somehow tell Daffodil to stop looking for B's. One solution 
> here is to add an assertion to test that each B element does not look like a 
> not 
> a number. The DFDL expression language doesn't have a good way to test if a 
> string is a number or not, but a regex pattern test could work:
> 
>  >
> 
>  >   <xs:element name="B" type="xs:string" maxOccurs="50"
> 
>  >     dfdl:occursCountKind="implicit">
> 
>  >     <xs:annotation>
> 
>  >       <xs:appinfo source="http://www.ogf.org/dfdl/";>
> 
>  >         <dfdl:assert testKind="pattern" testPattern=".*[^0-9].*" />
> 
>  >       </xs:appinfo>
> 
>  >     </xs:annotation>
> 
>  >   </xs:element>
> 
>  >
> 
>  > This regular expression says that all B element must contains at least one 
> character that is not a numeric digit. So when Daffodil gets to "100", the 
> assertion will fail since it is all numbers, and we'll stop parsing B's and 
> start looking for C's.
> 
>  >
> 
>  > - Steve
> 
>  >
> 
>  >
> 
>  > On 5/10/19 8:00 AM, Costello, Roger L. wrote:
> 
>  >> Hello DFDL community,
> 
>  >>
> 
>  >> My input file consists of exactly 3 integers, each on a new line,
> 
>  >> followed by an arbitrary number of strings, again, each on a new
> 
>  >> line, followed by a number of integers, the number being determined by 
> the 
> first integer in the file. For example:
> 
>  >>
> 
>  >> 6
> 
>  >> 1
> 
>  >> 2
> 
>  >> Banana
> 
>  >> Orange
> 
>  >> Apple
> 
>  >> Grape
> 
>  >> 100
> 
>  >> 200
> 
>  >> 300
> 
>  >> 400
> 
>  >> 500
> 
>  >> 600
> 
>  >>
> 
>  >> Below is my DFDL schema. It generates this error:
> 
>  >>
> 
>  >> *[error] Parse Error: Failed to parse infix separator. Cause: Parse Error:
> 
>  >> Separator '%NL;' not found.*
> 
>  >>
> 
>  >> Why is that error is being generated? How to fix the DFDL schema?
> 
>  >> /Roger
> 
>  >>
> 
>  >> <xs:elementname="input">
> 
>  >> <xs:complexType>
> 
>  >> <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix">
> 
>  >> <xs:elementname="A"type="xs:integer"
> 
>  >>                          minOccurs="3"maxOccurs="3"
> 
>  >>                          dfdl:occursCountKind="fixed"/>
> 
>  >> <xs:elementname="B"type="xs:string"maxOccurs="50"
> 
>  >>                          dfdl:occursCountKind="implicit"/>
> 
>  >> <xs:elementname="C"type="xs:integer"maxOccurs="unbounded"
> 
>  >>                         dfdl:occursCountKind="expression"
> 
>  >>                          dfdl:occursCount="{ ../A[1] }"/>
> 
>  >> </xs:sequence> </xs:complexType> </xs:element>
> 
>  >>
> 
>  >
> 

Reply via email to