By lookahead, I was referring to looking at future elements (like C) and trying to figure out how many occurrences of those should exist. So maybe that was a poor choice of words.
Speculative parsing is the idea that when parsing data there are points of uncertainties where we aren't sure what the next bytes of data represent. To resolve these points of uncertainties, Daffodil will try to parse the data in on way and if something goes wrong it will back up to where the point of uncertainty started and try to parse those same bytes using another way. This continues until either one of the options succeeds or all of the options fail. So in this schema, we reach a point of uncertainty where we're not sure if the data represents a B or a C. So Daffodil will speculatively try parsing B's. If something goes wrong and we fail to parse a B, Daffodil will backup and try parsing C's. In this case, we use the assert to tell Daffodil that it speculated that something was a B, but it was wrong, and it should trying the next option in the PoU. - Steve On 5/10/19 2:23 PM, Costello, Roger L. wrote: > Hi Steve, > > Aren't these statements at odds with each other: > >> ... the number of occurrences is to be established >> using speculative parsing > > versus > >> There's no concept of lookahead or smarts about >> how many values might appear after the element. > > I honestly don't know what "speculative parsing" is, but it sounds like it > would involve lookahead and smarts about how many values might appear after > the element. No? > > /Roger > > -----Original Message----- > From: Steve Lawrence <[email protected]> > Sent: Friday, May 10, 2019 1:57 PM > To: [email protected] > Subject: [EXT] Re: Why am I getting this error message: Failed to parse infix > separator. Cause: Parse Error: Separator '%NL;' not found. > > I think that's right, but might be a bit of an oversimplification. For > example, it doesn't talk about defaults, only sort of implies the concept of > speculative parsing, doesn't talk about what happens if M occurrences aren't > found, etc. A more exact, but maybe a bit more complex description is section > 16.1.2 in the spec [1]: > >> The enum 'implicit' should be used when the number of occurrences is to be >> established using speculative parsing, and there are lower and upper bounds >> to control the speculation. The bounds are provided by the XSDL minOccurs >> and XSDL maxOccurs properties. >> >> When parsing, up to maxOccurs occurrences are expected in the data. It is a >> processing error if less than minOccurs occurrences are found or defaulted. >> The parser stops looking for occurrences when either minOccurs have been >> found or defaulted and speculative parsing does not find another occurrence, >> or maxOccurs have been found or defaulted. >> >> When unparsing, up to maxOccurs occurrences are expected in the infoset. It >> is a processing error if less than minOccurs occurrences are found or >> defaulted, or if more than maxOccurs occurrences are found. > > [1] https://daffodil.apache.org/docs/dfdl/#_Toc398030791 > > > > On 5/10/19 1:48 PM, Costello, Roger L. wrote: >> Excellent! Thank you Steve. >> >> Is the following an accurate description of dfdl:occursCountKind='implicit'? >> >> Suppose an element declaration has dfdl:occursCountKind='implicit' >> with minOccurs="M" and maxOccurs="N"... This instructs Daffodil to >> consume between M and N values. There's no concept of lookahead or >> smarts about how many values might appear after the element. Daffodil >> just keeps consuming values until either it consumes N values or one >> of the values fails to parse (i.e., the value fails to meet the element's >> requirements). >> >> -----Original Message----- >> From: Steve Lawrence <[email protected]> >> Sent: Friday, May 10, 2019 11:56 AM >> To: [email protected] >> Subject: [EXT] Re: Why am I getting this error message: Failed to >> parse infix separator. Cause: Parse Error: Separator '%NL;' not found. >> >> dfdl:occursCountKind="implicit" just says to parse somewhere between >> minOcurs and maxOccurs elements. There's no concept of lookahead or >> smarts about how many elements might appear after it. It literally >> just keeps trying to parse B elements until either we reach maxOccurs of >> them or one of them fails to parse. >> The assert was used to cause it to fail to parse when it reached >> something that didn't look like a B. >> >> And yeah, my schema is just plain wrong. Assert pattern matches the >> data stream, but my intention was to match the parsed value. The >> assert pattern could probably be changed, but I think it's a bit more >> clear to put a pattern restriction on the B element and change the >> assert to call checkConstraints. So something like this: >> >> <xs:element name="B" maxOccurs="50" >> dfdl:occursCountKind="implicit"> >> >> <xs:simpleType> >> >> <xs:annotation> >> >> <xs:appinfo source="http://www.ogf.org/dfdl/"> >> >> <dfdl:assert test="{ dfdl:checkConstraints(.) }" /> >> >> </xs:appinfo> >> >> </xs:annotation> >> >> <xs:restriction base="xs:string"> >> >> <xs:pattern value=".*[^0-9].*" /> >> >> </xs:restriction> >> >> </xs:simpleType> >> >> </xs:element> >> >> So each B is parsed, then we assert that the parsed value validates >> according to the pattern value. When a value doesn't validate, that's >> how we know we have reached the C elements. >> >> - Steve >> >> On 5/10/19 11:36 AM, Costello, Roger L. wrote: >> >> > Hi Steve, >> >> > >> >> > I guess that I don't understand dfdl:occursCountKind="implicit". I >> thought it >> means: "Hey Daffodil, figure out the appropriate occurrences of B >> elements by inferring from the occurrence needs of its following >> elements." In this case, C's are the following elements and the number >> of occurrences of C is equal to the value of the first A element. That >> is, the occurrence needs for C is easily determined, so the occurrence >> needs of B should be easily inferred. That is, it seems to me that Daffodil >> should be able to recognize that these values: >> >> > >> >> > 100 >> >> > 200 >> >> > 300 >> >> > 400 >> >> > 500 >> >> > 600 >> >> > >> >> > are for the C element and the declaration for the B element should >> not need an assert to specify, "Give me only strings up till the point >> where digits are encountered." By adding dfdl:assert to the schema it >> is effectively neutering the dfdl:occursCountKind="implicit". I am confused. >> >> > >> >> > Second question: I modified the schema as you suggested. See below. >> However, I now get this error message: >> >> > >> >> > [error] Parse Error: Failed to populate C[2]. Missing infix separator. >> Cause: >> Parse Error: Separator '%NL;' not found. >> >> > >> >> > <xs:element name="input"> >> >> > <xs:complexType> >> >> > <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix"> >> >> > <xs:element name="A" type="xs:integer" minOccurs="3" >> >> > maxOccurs="3" dfdl:occursCountKind="fixed" /> >> >> > <xs:element name="B" type="xs:string" maxOccurs="50" >> >> > dfdl:occursCountKind="implicit"> >> >> > <xs:annotation> >> >> > <xs:appinfo source="http://www.ogf.org/dfdl/"> >> >> > <dfdl:assert testKind="pattern" >> testPattern=".*[^0-9].*" /> >> >> > </xs:appinfo> >> >> > </xs:annotation> >> >> > </xs:element> >> >> > <xs:element name="C" type="xs:integer" maxOccurs="unbounded" >> >> > dfdl:occursCountKind="expression" >> >> > dfdl:occursCount="{ ../A[1] }" /> >> >> > </xs:sequence> >> >> > </xs:complexType> >> >> > </xs:element> >> >> > >> >> > -----Original Message----- >> >> > From: Steve Lawrence <[email protected] >> <mailto:[email protected]>> >> >> > Sent: Friday, May 10, 2019 9:08 AM >> >> > To: [email protected] <mailto:[email protected]> >> >> > Subject: [EXT] Re: Why am I getting this error message: Failed to >> parse infix separator. Cause: Parse Error: Separator '%NL;' not found. >> >> > >> >> > The issue is that element B can be 50 or fewer strings. And >> although 100, 200, etc. look like numbers, they are also completely >> valid strings. So Daffodil will just keep consuming every line after the >> first three numbers as B elements. >> Daffodil still expects a separator followed by some C's, but we hit >> the end of the data and error out saying we were looking for that separator. >> >> > >> >> > So we need to somehow tell Daffodil to stop looking for B's. One >> solution here is to add an assertion to test that each B element does >> not look like a not a number. The DFDL expression language doesn't >> have a good way to test if a string is a number or not, but a regex pattern >> test could work: >> >> > >> >> > <xs:element name="B" type="xs:string" maxOccurs="50" >> >> > dfdl:occursCountKind="implicit"> >> >> > <xs:annotation> >> >> > <xs:appinfo source="http://www.ogf.org/dfdl/"> >> >> > <dfdl:assert testKind="pattern" testPattern=".*[^0-9].*" /> >> >> > </xs:appinfo> >> >> > </xs:annotation> >> >> > </xs:element> >> >> > >> >> > This regular expression says that all B element must contains at >> least one character that is not a numeric digit. So when Daffodil gets >> to "100", the assertion will fail since it is all numbers, and we'll >> stop parsing B's and start looking for C's. >> >> > >> >> > - Steve >> >> > >> >> > >> >> > On 5/10/19 8:00 AM, Costello, Roger L. wrote: >> >> >> Hello DFDL community, >> >> >> >> >> >> My input file consists of exactly 3 integers, each on a new line, >> >> >> followed by an arbitrary number of strings, again, each on a new >> >> >> line, followed by a number of integers, the number being >> determined by the first integer in the file. For example: >> >> >> >> >> >> 6 >> >> >> 1 >> >> >> 2 >> >> >> Banana >> >> >> Orange >> >> >> Apple >> >> >> Grape >> >> >> 100 >> >> >> 200 >> >> >> 300 >> >> >> 400 >> >> >> 500 >> >> >> 600 >> >> >> >> >> >> Below is my DFDL schema. It generates this error: >> >> >> >> >> >> *[error] Parse Error: Failed to parse infix separator. Cause: Parse >> Error: >> >> >> Separator '%NL;' not found.* >> >> >> >> >> >> Why is that error is being generated? How to fix the DFDL schema? >> >> >> /Roger >> >> >> >> >> >> <xs:elementname="input"> >> >> >> <xs:complexType> >> >> >> <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix"> >> >> >> <xs:elementname="A"type="xs:integer" >> >> >> minOccurs="3"maxOccurs="3" >> >> >> dfdl:occursCountKind="fixed"/> >> >> >> <xs:elementname="B"type="xs:string"maxOccurs="50" >> >> >> dfdl:occursCountKind="implicit"/> >> >> >> <xs:elementname="C"type="xs:integer"maxOccurs="unbounded" >> >> >> dfdl:occursCountKind="expression" >> >> >> dfdl:occursCount="{ ../A[1] }"/> >> >> >> </xs:sequence> </xs:complexType> </xs:element> >> >> >> >> >> > >> >
