I can't reproduce this issue with Daffodil 2.3.0: $ daffodil parse -s test.dfdl.xsd test.txt <ex:input xmlns:ex="http://example.com"> <A>6</A> <A>1</A> <A>2</A> <B>Banana</B> <B>Orange</B> <B>Apple</B> <B>Grape</B> <C>100</C> <C>200</C> <C>300</C> <C>400</C> <C>500</C> <C>600</C> </ex:input>
$ daffodil parse -s test.dfdl.xsd test.txt | daffodil unparse -s test.dfdl.xsd 6 1 2 Banana Orange Apple Grape 100 200 300 400 500 600 Maybe you're using an old daffodil version? - Steve On 5/13/19 7:15 AM, Costello, Roger L. wrote: > Thanks again Steve. Now my input file is parsing correctly, but unparsing is > losing data: > > Below is my DFDL schema. Why am I losing 100, 200, …, 600? /Roger > > <xs:elementname="input"> > <xs:complexType> > <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix"> > <xs:elementname="A"type="xs:integer" > minOccurs="3"maxOccurs="3"dfdl:occursCountKind="fixed"/> > <xs:elementname="B"maxOccurs="50"dfdl:occursCountKind="implicit"> > <xs:simpleType> > <xs:annotation> > <xs:appinfosource="http://www.ogf.org/dfdl/"> > <dfdl:asserttest="{ dfdl:checkConstraints(.) }"/> > </xs:appinfo> > </xs:annotation> > <xs:restrictionbase="xs:string"> > <xs:patternvalue=".*[^0-9].*"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > <xs:elementname="C"type="xs:integer" > maxOccurs="unbounded"dfdl:occursCountKind="expression" > dfdl:occursCount="{ ../A[1] }"/> > </xs:sequence> > </xs:complexType> > </xs:element> > > -----Original Message----- > From: Steve Lawrence <[email protected]> > Sent: Friday, May 10, 2019 2:37 PM > To: [email protected] > Subject: [EXT] Re: Why am I getting this error message: Failed to parse infix > separator. Cause: Parse Error: Separator '%NL;' not found. > > By lookahead, I was referring to looking at future elements (like C) and > trying > to figure out how many occurrences of those should exist. So maybe that was a > poor choice of words. > > Speculative parsing is the idea that when parsing data there are points of > uncertainties where we aren't sure what the next bytes of data represent. To > resolve these points of uncertainties, Daffodil will try to parse the data in > on > way and if something goes wrong it will back up to where the point of > uncertainty started and try to parse those same bytes using another way. This > continues until either one of the options succeeds or all of the options fail. > > So in this schema, we reach a point of uncertainty where we're not sure if > the > data represents a B or a C. So Daffodil will speculatively try parsing B's. > If > something goes wrong and we fail to parse a B, Daffodil will backup and try > parsing C's. In this case, we use the assert to tell Daffodil that it > speculated > that something was a B, but it was wrong, and it should trying the next > option > in the PoU. > > - Steve > > On 5/10/19 2:23 PM, Costello, Roger L. wrote: > > > Hi Steve, > > > > > > Aren't these statements at odds with each other: > > > > > >> ... the number of occurrences is to be established using speculative > > >> parsing > > > > > > versus > > > > > >> There's no concept of lookahead or smarts about how many values might > > >> appear after the element. > > > > > > I honestly don't know what "speculative parsing" is, but it sounds like it > would involve lookahead and smarts about how many values might appear after > the > element. No? > > > > > > /Roger > > > > > > -----Original Message----- > > > From: Steve Lawrence <[email protected] <mailto:[email protected]>> > > > Sent: Friday, May 10, 2019 1:57 PM > > > To: [email protected] <mailto:[email protected]> > > > Subject: [EXT] Re: Why am I getting this error message: Failed to parse > infix > separator. Cause: Parse Error: Separator '%NL;' not found. > > > > > > I think that's right, but might be a bit of an oversimplification. For > example, it doesn't talk about defaults, only sort of implies the concept of > speculative parsing, doesn't talk about what happens if M occurrences aren't > found, etc. A more exact, but maybe a bit more complex description is section > 16.1.2 in the spec [1]: > > > > > >> The enum 'implicit' should be used when the number of occurrences is to > be > established using speculative parsing, and there are lower and upper bounds > to > control the speculation. The bounds are provided by the XSDL minOccurs and > XSDL > maxOccurs properties. > > >> > > >> When parsing, up to maxOccurs occurrences are expected in the data. It is > a > processing error if less than minOccurs occurrences are found or defaulted. > The > parser stops looking for occurrences when either minOccurs have been found or > defaulted and speculative parsing does not find another occurrence, or > maxOccurs > have been found or defaulted. > > >> > > >> When unparsing, up to maxOccurs occurrences are expected in the infoset. > It > is a processing error if less than minOccurs occurrences are found or > defaulted, > or if more than maxOccurs occurrences are found. > > > > > > [1] https://daffodil.apache.org/docs/dfdl/#_Toc398030791 > > > > > > > > > > > > On 5/10/19 1:48 PM, Costello, Roger L. wrote: > > >> Excellent! Thank you Steve. > > >> > > >> Is the following an accurate description of > dfdl:occursCountKind='implicit'? > > >> > > >> Suppose an element declaration has dfdl:occursCountKind='implicit' > > >> with minOccurs="M" and maxOccurs="N"... This instructs Daffodil to > > >> consume between M and N values. There's no concept of lookahead or > > >> smarts about how many values might appear after the element. Daffodil > > >> just keeps consuming values until either it consumes N values or one > > >> of the values fails to parse (i.e., the value fails to meet the element's > requirements). > > >> > > >> -----Original Message----- > > >> From: Steve Lawrence <[email protected] <mailto:[email protected]>> > > >> Sent: Friday, May 10, 2019 11:56 AM > > >> To: [email protected] <mailto:[email protected]> > > >> Subject: [EXT] Re: Why am I getting this error message: Failed to > > >> parse infix separator. Cause: Parse Error: Separator '%NL;' not found. > > >> > > >> dfdl:occursCountKind="implicit" just says to parse somewhere between > > >> minOcurs and maxOccurs elements. There's no concept of lookahead or > > >> smarts about how many elements might appear after it. It literally > > >> just keeps trying to parse B elements until either we reach maxOccurs of > them or one of them fails to parse. > > >> The assert was used to cause it to fail to parse when it reached > > >> something that didn't look like a B. > > >> > > >> And yeah, my schema is just plain wrong. Assert pattern matches the > > >> data stream, but my intention was to match the parsed value. The > > >> assert pattern could probably be changed, but I think it's a bit more > > >> clear to put a pattern restriction on the B element and change the > > >> assert to call checkConstraints. So something like this: > > >> > > >> <xs:element name="B" maxOccurs="50" > > >> dfdl:occursCountKind="implicit"> > > >> > > >> <xs:simpleType> > > >> > > >> <xs:annotation> > > >> > > >> <xs:appinfo source="http://www.ogf.org/dfdl/"> > > >> > > >> <dfdl:assert test="{ dfdl:checkConstraints(.) }" /> > > >> > > >> </xs:appinfo> > > >> > > >> </xs:annotation> > > >> > > >> <xs:restriction base="xs:string"> > > >> > > >> <xs:pattern value=".*[^0-9].*" /> > > >> > > >> </xs:restriction> > > >> > > >> </xs:simpleType> > > >> > > >> </xs:element> > > >> > > >> So each B is parsed, then we assert that the parsed value validates > > >> according to the pattern value. When a value doesn't validate, that's > > >> how we know we have reached the C elements. > > >> > > >> - Steve > > >> > > >> On 5/10/19 11:36 AM, Costello, Roger L. wrote: > > >> > > >> > Hi Steve, > > >> > > >> > > > >> > > >> > I guess that I don't understand dfdl:occursCountKind="implicit". I > > >> thought it > > >> means: "Hey Daffodil, figure out the appropriate occurrences of B > > >> elements by inferring from the occurrence needs of its following > > >> elements." In this case, C's are the following elements and the > > >> number of occurrences of C is equal to the value of the first A > > >> element. That is, the occurrence needs for C is easily determined, so > > >> the occurrence needs of B should be easily inferred. That is, it seems to > me > that Daffodil should be able to recognize that these values: > > >> > > >> > > > >> > > >> > 100 > > >> > > >> > 200 > > >> > > >> > 300 > > >> > > >> > 400 > > >> > > >> > 500 > > >> > > >> > 600 > > >> > > >> > > > >> > > >> > are for the C element and the declaration for the B element should > > >> not need an assert to specify, "Give me only strings up till the > > >> point where digits are encountered." By adding dfdl:assert to the > > >> schema it is effectively neutering the dfdl:occursCountKind="implicit". I > am > confused. > > >> > > >> > > > >> > > >> > Second question: I modified the schema as you suggested. See below. > > >> However, I now get this error message: > > >> > > >> > > > >> > > >> > [error] Parse Error: Failed to populate C[2]. Missing infix separator. > Cause: > > >> Parse Error: Separator '%NL;' not found. > > >> > > >> > > > >> > > >> > <xs:element name="input"> > > >> > > >> > <xs:complexType> > > >> > > >> > <xs:sequence dfdl:separator="%NL;" > dfdl:separatorPosition="infix"> > > >> > > >> > <xs:element name="A" type="xs:integer" minOccurs="3" > > >> > > >> > maxOccurs="3" dfdl:occursCountKind="fixed" > /> > > >> > > >> > <xs:element name="B" type="xs:string" maxOccurs="50" > > >> > > >> > dfdl:occursCountKind="implicit"> > > >> > > >> > <xs:annotation> > > >> > > >> > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > >> > > >> > <dfdl:assert testKind="pattern" > > >> testPattern=".*[^0-9].*" /> > > >> > > >> > </xs:appinfo> > > >> > > >> > </xs:annotation> > > >> > > >> > </xs:element> > > >> > > >> > <xs:element name="C" type="xs:integer" > maxOccurs="unbounded" > > >> > > >> > dfdl:occursCountKind="expression" > > >> > > >> > dfdl:occursCount="{ ../A[1] }" /> > > >> > > >> > </xs:sequence> > > >> > > >> > </xs:complexType> > > >> > > >> > </xs:element> > > >> > > >> > > > >> > > >> > -----Original Message----- > > >> > > >> > From: Steve Lawrence <[email protected] > > >> <mailto:[email protected]>> > > >> > > >> > Sent: Friday, May 10, 2019 9:08 AM > > >> > > >> > To: [email protected] <mailto:[email protected]> > <mailto:[email protected]> > > >> > > >> > Subject: [EXT] Re: Why am I getting this error message: Failed to > > >> parse infix separator. Cause: Parse Error: Separator '%NL;' not found. > > >> > > >> > > > >> > > >> > The issue is that element B can be 50 or fewer strings. And > > >> although 100, 200, etc. look like numbers, they are also completely > > >> valid strings. So Daffodil will just keep consuming every line after the > first three numbers as B elements. > > >> Daffodil still expects a separator followed by some C's, but we hit > > >> the end of the data and error out saying we were looking for that > separator. > > >> > > >> > > > >> > > >> > So we need to somehow tell Daffodil to stop looking for B's. One > > >> solution here is to add an assertion to test that each B element does > > >> not look like a not a number. The DFDL expression language doesn't > > >> have a good way to test if a string is a number or not, but a regex > pattern > test could work: > > >> > > >> > > > >> > > >> > <xs:element name="B" type="xs:string" maxOccurs="50" > > >> > > >> > dfdl:occursCountKind="implicit"> > > >> > > >> > <xs:annotation> > > >> > > >> > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > >> > > >> > <dfdl:assert testKind="pattern" testPattern=".*[^0-9].*" /> > > >> > > >> > </xs:appinfo> > > >> > > >> > </xs:annotation> > > >> > > >> > </xs:element> > > >> > > >> > > > >> > > >> > This regular expression says that all B element must contains at > > >> least one character that is not a numeric digit. So when Daffodil > > >> gets to "100", the assertion will fail since it is all numbers, and > > >> we'll stop parsing B's and start looking for C's. > > >> > > >> > > > >> > > >> > - Steve > > >> > > >> > > > >> > > >> > > > >> > > >> > On 5/10/19 8:00 AM, Costello, Roger L. wrote: > > >> > > >> >> Hello DFDL community, > > >> > > >> >> > > >> > > >> >> My input file consists of exactly 3 integers, each on a new line, > > >> > > >> >> followed by an arbitrary number of strings, again, each on a new > > >> > > >> >> line, followed by a number of integers, the number being > > >> determined by the first integer in the file. For example: > > >> > > >> >> > > >> > > >> >> 6 > > >> > > >> >> 1 > > >> > > >> >> 2 > > >> > > >> >> Banana > > >> > > >> >> Orange > > >> > > >> >> Apple > > >> > > >> >> Grape > > >> > > >> >> 100 > > >> > > >> >> 200 > > >> > > >> >> 300 > > >> > > >> >> 400 > > >> > > >> >> 500 > > >> > > >> >> 600 > > >> > > >> >> > > >> > > >> >> Below is my DFDL schema. It generates this error: > > >> > > >> >> > > >> > > >> >> *[error] Parse Error: Failed to parse infix separator. Cause: Parse > Error: > > >> > > >> >> Separator '%NL;' not found.* > > >> > > >> >> > > >> > > >> >> Why is that error is being generated? How to fix the DFDL schema? > > >> > > >> >> /Roger > > >> > > >> >> > > >> > > >> >> <xs:elementname="input"> > > >> > > >> >> <xs:complexType> > > >> > > >> >> <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix"> > > >> > > >> >> <xs:elementname="A"type="xs:integer" > > >> > > >> >> minOccurs="3"maxOccurs="3" > > >> > > >> >> dfdl:occursCountKind="fixed"/> > > >> > > >> >> <xs:elementname="B"type="xs:string"maxOccurs="50" > > >> > > >> >> dfdl:occursCountKind="implicit"/> > > >> > > >> >> <xs:elementname="C"type="xs:integer"maxOccurs="unbounded" > > >> > > >> >> dfdl:occursCountKind="expression" > > >> > > >> >> dfdl:occursCount="{ ../A[1] }"/> > > >> > > >> >> </xs:sequence> </xs:complexType> </xs:element> > > >> > > >> >> > > >> > > >> > > > >> > > > >
