Hi Steve, Aren't these statements at odds with each other:
> ... the number of occurrences is to be established > using speculative parsing versus > There's no concept of lookahead or smarts about > how many values might appear after the element. I honestly don't know what "speculative parsing" is, but it sounds like it would involve lookahead and smarts about how many values might appear after the element. No? /Roger -----Original Message----- From: Steve Lawrence <[email protected]> Sent: Friday, May 10, 2019 1:57 PM To: [email protected] Subject: [EXT] Re: Why am I getting this error message: Failed to parse infix separator. Cause: Parse Error: Separator '%NL;' not found. I think that's right, but might be a bit of an oversimplification. For example, it doesn't talk about defaults, only sort of implies the concept of speculative parsing, doesn't talk about what happens if M occurrences aren't found, etc. A more exact, but maybe a bit more complex description is section 16.1.2 in the spec [1]: > The enum 'implicit' should be used when the number of occurrences is to be > established using speculative parsing, and there are lower and upper bounds > to control the speculation. The bounds are provided by the XSDL minOccurs and > XSDL maxOccurs properties. > > When parsing, up to maxOccurs occurrences are expected in the data. It is a > processing error if less than minOccurs occurrences are found or defaulted. > The parser stops looking for occurrences when either minOccurs have been > found or defaulted and speculative parsing does not find another occurrence, > or maxOccurs have been found or defaulted. > > When unparsing, up to maxOccurs occurrences are expected in the infoset. It > is a processing error if less than minOccurs occurrences are found or > defaulted, or if more than maxOccurs occurrences are found. [1] https://daffodil.apache.org/docs/dfdl/#_Toc398030791 On 5/10/19 1:48 PM, Costello, Roger L. wrote: > Excellent! Thank you Steve. > > Is the following an accurate description of dfdl:occursCountKind='implicit'? > > Suppose an element declaration has dfdl:occursCountKind='implicit' > with minOccurs="M" and maxOccurs="N"... This instructs Daffodil to > consume between M and N values. There's no concept of lookahead or > smarts about how many values might appear after the element. Daffodil > just keeps consuming values until either it consumes N values or one > of the values fails to parse (i.e., the value fails to meet the element's > requirements). > > -----Original Message----- > From: Steve Lawrence <[email protected]> > Sent: Friday, May 10, 2019 11:56 AM > To: [email protected] > Subject: [EXT] Re: Why am I getting this error message: Failed to > parse infix separator. Cause: Parse Error: Separator '%NL;' not found. > > dfdl:occursCountKind="implicit" just says to parse somewhere between > minOcurs and maxOccurs elements. There's no concept of lookahead or > smarts about how many elements might appear after it. It literally > just keeps trying to parse B elements until either we reach maxOccurs of them > or one of them fails to parse. > The assert was used to cause it to fail to parse when it reached > something that didn't look like a B. > > And yeah, my schema is just plain wrong. Assert pattern matches the > data stream, but my intention was to match the parsed value. The > assert pattern could probably be changed, but I think it's a bit more > clear to put a pattern restriction on the B element and change the > assert to call checkConstraints. So something like this: > > <xs:element name="B" maxOccurs="50" > dfdl:occursCountKind="implicit"> > > <xs:simpleType> > > <xs:annotation> > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > <dfdl:assert test="{ dfdl:checkConstraints(.) }" /> > > </xs:appinfo> > > </xs:annotation> > > <xs:restriction base="xs:string"> > > <xs:pattern value=".*[^0-9].*" /> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > So each B is parsed, then we assert that the parsed value validates > according to the pattern value. When a value doesn't validate, that's > how we know we have reached the C elements. > > - Steve > > On 5/10/19 11:36 AM, Costello, Roger L. wrote: > > > Hi Steve, > > > > > > I guess that I don't understand dfdl:occursCountKind="implicit". I > thought it > means: "Hey Daffodil, figure out the appropriate occurrences of B > elements by inferring from the occurrence needs of its following > elements." In this case, C's are the following elements and the number > of occurrences of C is equal to the value of the first A element. That > is, the occurrence needs for C is easily determined, so the occurrence > needs of B should be easily inferred. That is, it seems to me that Daffodil > should be able to recognize that these values: > > > > > > 100 > > > 200 > > > 300 > > > 400 > > > 500 > > > 600 > > > > > > are for the C element and the declaration for the B element should > not need an assert to specify, "Give me only strings up till the point > where digits are encountered." By adding dfdl:assert to the schema it > is effectively neutering the dfdl:occursCountKind="implicit". I am confused. > > > > > > Second question: I modified the schema as you suggested. See below. > However, I now get this error message: > > > > > > [error] Parse Error: Failed to populate C[2]. Missing infix separator. > Cause: > Parse Error: Separator '%NL;' not found. > > > > > > <xs:element name="input"> > > > <xs:complexType> > > > <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix"> > > > <xs:element name="A" type="xs:integer" minOccurs="3" > > > maxOccurs="3" dfdl:occursCountKind="fixed" /> > > > <xs:element name="B" type="xs:string" maxOccurs="50" > > > dfdl:occursCountKind="implicit"> > > > <xs:annotation> > > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > > <dfdl:assert testKind="pattern" > testPattern=".*[^0-9].*" /> > > > </xs:appinfo> > > > </xs:annotation> > > > </xs:element> > > > <xs:element name="C" type="xs:integer" maxOccurs="unbounded" > > > dfdl:occursCountKind="expression" > > > dfdl:occursCount="{ ../A[1] }" /> > > > </xs:sequence> > > > </xs:complexType> > > > </xs:element> > > > > > > -----Original Message----- > > > From: Steve Lawrence <[email protected] > <mailto:[email protected]>> > > > Sent: Friday, May 10, 2019 9:08 AM > > > To: [email protected] <mailto:[email protected]> > > > Subject: [EXT] Re: Why am I getting this error message: Failed to > parse infix separator. Cause: Parse Error: Separator '%NL;' not found. > > > > > > The issue is that element B can be 50 or fewer strings. And > although 100, 200, etc. look like numbers, they are also completely > valid strings. So Daffodil will just keep consuming every line after the > first three numbers as B elements. > Daffodil still expects a separator followed by some C's, but we hit > the end of the data and error out saying we were looking for that separator. > > > > > > So we need to somehow tell Daffodil to stop looking for B's. One > solution here is to add an assertion to test that each B element does > not look like a not a number. The DFDL expression language doesn't > have a good way to test if a string is a number or not, but a regex pattern > test could work: > > > > > > <xs:element name="B" type="xs:string" maxOccurs="50" > > > dfdl:occursCountKind="implicit"> > > > <xs:annotation> > > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > > <dfdl:assert testKind="pattern" testPattern=".*[^0-9].*" /> > > > </xs:appinfo> > > > </xs:annotation> > > > </xs:element> > > > > > > This regular expression says that all B element must contains at > least one character that is not a numeric digit. So when Daffodil gets > to "100", the assertion will fail since it is all numbers, and we'll > stop parsing B's and start looking for C's. > > > > > > - Steve > > > > > > > > > On 5/10/19 8:00 AM, Costello, Roger L. wrote: > > >> Hello DFDL community, > > >> > > >> My input file consists of exactly 3 integers, each on a new line, > > >> followed by an arbitrary number of strings, again, each on a new > > >> line, followed by a number of integers, the number being > determined by the first integer in the file. For example: > > >> > > >> 6 > > >> 1 > > >> 2 > > >> Banana > > >> Orange > > >> Apple > > >> Grape > > >> 100 > > >> 200 > > >> 300 > > >> 400 > > >> 500 > > >> 600 > > >> > > >> Below is my DFDL schema. It generates this error: > > >> > > >> *[error] Parse Error: Failed to parse infix separator. Cause: Parse Error: > > >> Separator '%NL;' not found.* > > >> > > >> Why is that error is being generated? How to fix the DFDL schema? > > >> /Roger > > >> > > >> <xs:elementname="input"> > > >> <xs:complexType> > > >> <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix"> > > >> <xs:elementname="A"type="xs:integer" > > >> minOccurs="3"maxOccurs="3" > > >> dfdl:occursCountKind="fixed"/> > > >> <xs:elementname="B"type="xs:string"maxOccurs="50" > > >> dfdl:occursCountKind="implicit"/> > > >> <xs:elementname="C"type="xs:integer"maxOccurs="unbounded" > > >> dfdl:occursCountKind="expression" > > >> dfdl:occursCount="{ ../A[1] }"/> > > >> </xs:sequence> </xs:complexType> </xs:element> > > >> > > > >
