Re: Why am I getting this error message: Failed to parse infix separator. Cause: Parse Error: Separator '%NL;' not found.

Steve Lawrence Tue, 14 May 2019 04:44:10 -0700

I can't reproduce this issue with Daffodil 2.3.0:

$ daffodil parse -s test.dfdl.xsd test.txt
<ex:input xmlns:ex="http://example.com";>
  <A>6</A>
  <A>1</A>
  <A>2</A>
  <B>Banana</B>
  <B>Orange</B>
  <B>Apple</B>
  <B>Grape</B>
  <C>100</C>
  <C>200</C>
  <C>300</C>
  <C>400</C>
  <C>500</C>
  <C>600</C>
</ex:input>


$ daffodil parse -s test.dfdl.xsd test.txt | daffodil unparse -s
test.dfdl.xsd
6
1
2
Banana
Orange
Apple
Grape
100
200
300
400
500
600

Maybe you're using an old daffodil version?

- Steve

On 5/13/19 7:15 AM, Costello, Roger L. wrote:
> Thanks again Steve. Now my input file is parsing correctly, but unparsing is 
> losing data:
> 
> Below is my DFDL schema. Why am I losing 100, 200, …, 600?  /Roger
> 
> <xs:elementname="input">
> <xs:complexType>
> <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix">
> <xs:elementname="A"type="xs:integer"
>                  minOccurs="3"maxOccurs="3"dfdl:occursCountKind="fixed"/>
> <xs:elementname="B"maxOccurs="50"dfdl:occursCountKind="implicit">
> <xs:simpleType>
> <xs:annotation>
> <xs:appinfosource="http://www.ogf.org/dfdl/";>
> <dfdl:asserttest="{ dfdl:checkConstraints(.) }"/>
> </xs:appinfo>
> </xs:annotation>
> <xs:restrictionbase="xs:string">
> <xs:patternvalue=".*[^0-9].*"/>
> </xs:restriction>
> </xs:simpleType>
> </xs:element>
> <xs:elementname="C"type="xs:integer"
>                  maxOccurs="unbounded"dfdl:occursCountKind="expression"
>                 dfdl:occursCount="{ ../A[1] }"/>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
> 
> -----Original Message-----
> From: Steve Lawrence <[email protected]>
> Sent: Friday, May 10, 2019 2:37 PM
> To: [email protected]
> Subject: [EXT] Re: Why am I getting this error message: Failed to parse infix 
> separator. Cause: Parse Error: Separator '%NL;' not found.
> 
> By lookahead, I was referring to looking at future elements (like C) and 
> trying 
> to figure out how many occurrences of those should exist. So maybe that was a 
> poor choice of words.
> 
> Speculative parsing is the idea that when parsing data there are points of 
> uncertainties where we aren't sure what the next bytes of data represent. To 
> resolve these points of uncertainties, Daffodil will try to parse the data in 
> on 
> way and if something goes wrong it will back up to where the point of 
> uncertainty started and try to parse those same bytes using another way. This 
> continues until either one of the options succeeds or all of the options fail.
> 
> So in this schema, we reach a point of uncertainty where we're not sure if 
> the 
> data represents a B or a C. So Daffodil will speculatively try parsing B's. 
> If 
> something goes wrong and we fail to parse a B, Daffodil will backup and try 
> parsing C's. In this case, we use the assert to tell Daffodil that it 
> speculated 
> that something was a B, but it was wrong, and it should trying the next 
> option 
> in the PoU.
> 
> - Steve
> 
> On 5/10/19 2:23 PM, Costello, Roger L. wrote:
> 
>  > Hi Steve,
> 
>  >
> 
>  > Aren't these statements at odds with each other:
> 
>  >
> 
>  >> ... the number of occurrences is to be established using speculative
> 
>  >> parsing
> 
>  >
> 
>  > versus
> 
>  >
> 
>  >> There's no concept of lookahead or smarts about how many values might
> 
>  >> appear after the element.
> 
>  >
> 
>  > I honestly don't know what "speculative parsing" is, but it sounds like it 
> would involve lookahead and smarts about how many values might appear after 
> the 
> element. No?
> 
>  >
> 
>  > /Roger
> 
>  >
> 
>  > -----Original Message-----
> 
>  > From: Steve Lawrence <[email protected] <mailto:[email protected]>>
> 
>  > Sent: Friday, May 10, 2019 1:57 PM
> 
>  > To: [email protected] <mailto:[email protected]>
> 
>  > Subject: [EXT] Re: Why am I getting this error message: Failed to parse 
> infix 
> separator. Cause: Parse Error: Separator '%NL;' not found.
> 
>  >
> 
>  > I think that's right, but might be a bit of an oversimplification. For 
> example, it doesn't talk about defaults, only sort of implies the concept of 
> speculative parsing, doesn't talk about what happens if M occurrences aren't 
> found, etc. A more exact, but maybe a bit more complex description is section 
> 16.1.2 in the spec [1]:
> 
>  >
> 
>  >> The enum 'implicit' should be used when the number of occurrences is to 
> be 
> established using speculative parsing, and there are lower and upper bounds 
> to 
> control the speculation. The bounds are provided by the XSDL minOccurs and 
> XSDL 
> maxOccurs properties.
> 
>  >>
> 
>  >> When parsing, up to maxOccurs occurrences are expected in the data. It is 
> a 
> processing error if less than minOccurs occurrences are found or defaulted. 
> The 
> parser stops looking for occurrences when either minOccurs have been found or 
> defaulted and speculative parsing does not find another occurrence, or 
> maxOccurs 
> have been found or defaulted.
> 
>  >>
> 
>  >> When unparsing, up to maxOccurs occurrences are expected in the infoset. 
> It 
> is a processing error if less than minOccurs occurrences are found or 
> defaulted, 
> or if more than maxOccurs occurrences are found.
> 
>  >
> 
>  > [1] https://daffodil.apache.org/docs/dfdl/#_Toc398030791
> 
>  >
> 
>  >
> 
>  >
> 
>  > On 5/10/19 1:48 PM, Costello, Roger L. wrote:
> 
>  >> Excellent! Thank you Steve.
> 
>  >>
> 
>  >> Is the following an accurate description of 
> dfdl:occursCountKind='implicit'?
> 
>  >>
> 
>  >> Suppose an element declaration has dfdl:occursCountKind='implicit'
> 
>  >> with minOccurs="M" and maxOccurs="N"... This instructs Daffodil to
> 
>  >> consume between M and N values. There's no concept of lookahead or
> 
>  >> smarts about how many values might appear after the element. Daffodil
> 
>  >> just keeps consuming values until either it consumes N values or one
> 
>  >> of the values fails to parse (i.e., the value fails to meet the element's 
> requirements).
> 
>  >>
> 
>  >> -----Original Message-----
> 
>  >> From: Steve Lawrence <[email protected] <mailto:[email protected]>>
> 
>  >> Sent: Friday, May 10, 2019 11:56 AM
> 
>  >> To: [email protected] <mailto:[email protected]>
> 
>  >> Subject: [EXT] Re: Why am I getting this error message: Failed to
> 
>  >> parse infix separator. Cause: Parse Error: Separator '%NL;' not found.
> 
>  >>
> 
>  >> dfdl:occursCountKind="implicit" just says to parse somewhere between
> 
>  >> minOcurs and maxOccurs elements. There's no concept of lookahead or
> 
>  >> smarts about how many elements might appear after it. It literally
> 
>  >> just keeps trying to parse B elements until either we reach maxOccurs of 
> them or one of them fails to parse.
> 
>  >> The assert was used to cause it to fail to parse when it reached
> 
>  >> something that didn't look like a B.
> 
>  >>
> 
>  >> And yeah, my schema is just plain wrong. Assert pattern matches the
> 
>  >> data stream, but my intention was to match the parsed value. The
> 
>  >> assert pattern could probably be changed, but I think it's a bit more
> 
>  >> clear to put a pattern restriction on the B element and change the
> 
>  >> assert to call checkConstraints. So something like this:
> 
>  >>
> 
>  >>    <xs:element name="B" maxOccurs="50"
> 
>  >> dfdl:occursCountKind="implicit">
> 
>  >>
> 
>  >>      <xs:simpleType>
> 
>  >>
> 
>  >>        <xs:annotation>
> 
>  >>
> 
>  >>          <xs:appinfo source="http://www.ogf.org/dfdl/";>
> 
>  >>
> 
>  >>            <dfdl:assert test="{ dfdl:checkConstraints(.) }" />
> 
>  >>
> 
>  >>          </xs:appinfo>
> 
>  >>
> 
>  >>        </xs:annotation>
> 
>  >>
> 
>  >>        <xs:restriction base="xs:string">
> 
>  >>
> 
>  >>          <xs:pattern value=".*[^0-9].*" />
> 
>  >>
> 
>  >>        </xs:restriction>
> 
>  >>
> 
>  >>      </xs:simpleType>
> 
>  >>
> 
>  >>    </xs:element>
> 
>  >>
> 
>  >> So each B is parsed, then we assert that the parsed value validates
> 
>  >> according to the pattern value. When a value doesn't validate, that's
> 
>  >> how we know we have reached the C elements.
> 
>  >>
> 
>  >> - Steve
> 
>  >>
> 
>  >> On 5/10/19 11:36 AM, Costello, Roger L. wrote:
> 
>  >>
> 
>  >>  > Hi Steve,
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > I guess that I don't understand dfdl:occursCountKind="implicit". I
> 
>  >> thought it
> 
>  >> means: "Hey Daffodil, figure out the appropriate occurrences of B
> 
>  >> elements by inferring from the occurrence needs of its following
> 
>  >> elements." In this case, C's are the following elements and the
> 
>  >> number of occurrences of C is equal to the value of the first A
> 
>  >> element. That is, the occurrence needs for C is easily determined, so
> 
>  >> the occurrence needs of B should be easily inferred. That is, it seems to 
> me 
> that Daffodil should be able to recognize that these values:
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > 100
> 
>  >>
> 
>  >>  > 200
> 
>  >>
> 
>  >>  > 300
> 
>  >>
> 
>  >>  > 400
> 
>  >>
> 
>  >>  > 500
> 
>  >>
> 
>  >>  > 600
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > are for the C element and the declaration for the B element should
> 
>  >> not need an assert to specify, "Give me only strings up till the
> 
>  >> point where digits are encountered." By adding dfdl:assert to the
> 
>  >> schema it is effectively neutering the dfdl:occursCountKind="implicit". I 
> am 
> confused.
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > Second question: I modified the schema as you suggested. See below.
> 
>  >> However, I now get this error message:
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > [error] Parse Error: Failed to populate C[2]. Missing infix separator. 
> Cause:
> 
>  >> Parse Error: Separator '%NL;' not found.
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > <xs:element name="input">
> 
>  >>
> 
>  >>  >     <xs:complexType>
> 
>  >>
> 
>  >>  >         <xs:sequence dfdl:separator="%NL;" 
> dfdl:separatorPosition="infix">
> 
>  >>
> 
>  >>  >             <xs:element name="A" type="xs:integer" minOccurs="3"
> 
>  >>
> 
>  >>  >                             maxOccurs="3" dfdl:occursCountKind="fixed" 
> />
> 
>  >>
> 
>  >>  >             <xs:element name="B" type="xs:string" maxOccurs="50"
> 
>  >>
> 
>  >>  >                             dfdl:occursCountKind="implicit">
> 
>  >>
> 
>  >>  >                 <xs:annotation>
> 
>  >>
> 
>  >>  >                     <xs:appinfo source="http://www.ogf.org/dfdl/";>
> 
>  >>
> 
>  >>  >                         <dfdl:assert testKind="pattern"
> 
>  >> testPattern=".*[^0-9].*" />
> 
>  >>
> 
>  >>  >                     </xs:appinfo>
> 
>  >>
> 
>  >>  >                 </xs:annotation>
> 
>  >>
> 
>  >>  >             </xs:element>
> 
>  >>
> 
>  >>  >             <xs:element name="C" type="xs:integer" 
> maxOccurs="unbounded"
> 
>  >>
> 
>  >>  >                             dfdl:occursCountKind="expression"
> 
>  >>
> 
>  >>  >                             dfdl:occursCount="{ ../A[1] }" />
> 
>  >>
> 
>  >>  >         </xs:sequence>
> 
>  >>
> 
>  >>  >     </xs:complexType>
> 
>  >>
> 
>  >>  > </xs:element>
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > -----Original Message-----
> 
>  >>
> 
>  >>  > From: Steve Lawrence <[email protected]
> 
>  >> <mailto:[email protected]>>
> 
>  >>
> 
>  >>  > Sent: Friday, May 10, 2019 9:08 AM
> 
>  >>
> 
>  >>  > To: [email protected] <mailto:[email protected]> 
> <mailto:[email protected]>
> 
>  >>
> 
>  >>  > Subject: [EXT] Re: Why am I getting this error message: Failed to
> 
>  >> parse infix separator. Cause: Parse Error: Separator '%NL;' not found.
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > The issue is that element B can be 50 or fewer strings. And
> 
>  >> although 100, 200, etc. look like numbers, they are also completely
> 
>  >> valid strings. So Daffodil will just keep consuming every line after the 
> first three numbers as B elements.
> 
>  >> Daffodil still expects a separator followed by some C's, but we hit
> 
>  >> the end of the data and error out saying we were looking for that 
> separator.
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > So we need to somehow tell Daffodil to stop looking for B's. One
> 
>  >> solution here is to add an assertion to test that each B element does
> 
>  >> not look like a not a number. The DFDL expression language doesn't
> 
>  >> have a good way to test if a string is a number or not, but a regex 
> pattern 
> test could work:
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  >   <xs:element name="B" type="xs:string" maxOccurs="50"
> 
>  >>
> 
>  >>  >     dfdl:occursCountKind="implicit">
> 
>  >>
> 
>  >>  >     <xs:annotation>
> 
>  >>
> 
>  >>  >       <xs:appinfo source="http://www.ogf.org/dfdl/";>
> 
>  >>
> 
>  >>  >         <dfdl:assert testKind="pattern" testPattern=".*[^0-9].*" />
> 
>  >>
> 
>  >>  >       </xs:appinfo>
> 
>  >>
> 
>  >>  >     </xs:annotation>
> 
>  >>
> 
>  >>  >   </xs:element>
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > This regular expression says that all B element must contains at
> 
>  >> least one character that is not a numeric digit. So when Daffodil
> 
>  >> gets to "100", the assertion will fail since it is all numbers, and
> 
>  >> we'll stop parsing B's and start looking for C's.
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > - Steve
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >>  > On 5/10/19 8:00 AM, Costello, Roger L. wrote:
> 
>  >>
> 
>  >>  >> Hello DFDL community,
> 
>  >>
> 
>  >>  >>
> 
>  >>
> 
>  >>  >> My input file consists of exactly 3 integers, each on a new line,
> 
>  >>
> 
>  >>  >> followed by an arbitrary number of strings, again, each on a new
> 
>  >>
> 
>  >>  >> line, followed by a number of integers, the number being
> 
>  >> determined by the first integer in the file. For example:
> 
>  >>
> 
>  >>  >>
> 
>  >>
> 
>  >>  >> 6
> 
>  >>
> 
>  >>  >> 1
> 
>  >>
> 
>  >>  >> 2
> 
>  >>
> 
>  >>  >> Banana
> 
>  >>
> 
>  >>  >> Orange
> 
>  >>
> 
>  >>  >> Apple
> 
>  >>
> 
>  >>  >> Grape
> 
>  >>
> 
>  >>  >> 100
> 
>  >>
> 
>  >>  >> 200
> 
>  >>
> 
>  >>  >> 300
> 
>  >>
> 
>  >>  >> 400
> 
>  >>
> 
>  >>  >> 500
> 
>  >>
> 
>  >>  >> 600
> 
>  >>
> 
>  >>  >>
> 
>  >>
> 
>  >>  >> Below is my DFDL schema. It generates this error:
> 
>  >>
> 
>  >>  >>
> 
>  >>
> 
>  >>  >> *[error] Parse Error: Failed to parse infix separator. Cause: Parse 
> Error:
> 
>  >>
> 
>  >>  >> Separator '%NL;' not found.*
> 
>  >>
> 
>  >>  >>
> 
>  >>
> 
>  >>  >> Why is that error is being generated? How to fix the DFDL schema?
> 
>  >>
> 
>  >>  >> /Roger
> 
>  >>
> 
>  >>  >>
> 
>  >>
> 
>  >>  >> <xs:elementname="input">
> 
>  >>
> 
>  >>  >> <xs:complexType>
> 
>  >>
> 
>  >>  >> <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix">
> 
>  >>
> 
>  >>  >> <xs:elementname="A"type="xs:integer"
> 
>  >>
> 
>  >>  >>                          minOccurs="3"maxOccurs="3"
> 
>  >>
> 
>  >>  >>                          dfdl:occursCountKind="fixed"/>
> 
>  >>
> 
>  >>  >> <xs:elementname="B"type="xs:string"maxOccurs="50"
> 
>  >>
> 
>  >>  >>                          dfdl:occursCountKind="implicit"/>
> 
>  >>
> 
>  >>  >> <xs:elementname="C"type="xs:integer"maxOccurs="unbounded"
> 
>  >>
> 
>  >>  >>                         dfdl:occursCountKind="expression"
> 
>  >>
> 
>  >>  >>                          dfdl:occursCount="{ ../A[1] }"/>
> 
>  >>
> 
>  >>  >> </xs:sequence> </xs:complexType> </xs:element>
> 
>  >>
> 
>  >>  >>
> 
>  >>
> 
>  >>  >
> 
>  >>
> 
>  >
>

Re: Why am I getting this error message: Failed to parse infix separator. Cause: Parse Error: Separator '%NL;' not found.

Reply via email to