Hi Mike,
Thank you very much! Based on your excellent information I succeeded in getting
the schema to work. See below for the working schema.
The key was the addition of dfdl:assert
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{ fn:string-length(.) gt 0 }</dfdl:assert>
</xs:appinfo>
</xs:annotation>
Without it, I get the "infinite loop" error.
I don't understand why the dfdl:assert should be necessary. After all, the plus
sign ( + ) in the regex
dfdl:lengthPattern="[\x20-\x7F]+?(?=\x00)"
specifies that the string must contain at least one character. Can you describe
a bit more why the dfdl:assert is needed, please?
Happy New Year! /Roger
<xs:element name="input">
<xs:complexType>
<xs:sequence>
<xs:element name="String" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="value" type="xs:string"
dfdl:lengthKind="pattern"
dfdl:lengthPattern="[\x20-\x7F]+?(?=\x00)"
dfdl:representation="text"
dfdl:encoding="ASCII">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{ fn:string-length(.) gt 0
}</dfdl:assert>
</xs:appinfo>
</xs:annotation>
</xs:element>
<xs:sequence dfdl:hiddenGroupRef="hidden_nulls_Group" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:group name="hidden_nulls_Group">
<xs:sequence>
<xs:element name="Hidden_nulls" type="xs:hexBinary"
dfdl:lengthKind="pattern"
dfdl:lengthUnits="bytes"
dfdl:lengthPattern="[\x00]+?(?=([^\x00]|$))"
dfdl:outputValueCalc='{ . }' />
</xs:sequence>
</xs:group>
From: Beckerle, Mike <[email protected]>
Sent: Monday, December 31, 2018 12:50 PM
To: [email protected]
Subject: [EXT] Re: Why am I getting an "infinite loop" error message?
I have hit what I think is this problem this problem a bunch of times.
I have come to think of it as a flaw in dfdl.
The problem is lengthKind pattern, and what it means when there is no match.
Intuitively we think no match should cause a failure, and backtrack, but what
it means is the length is "however much is matched", so no match means length
zero. I.e., no match is a successful parse, producing zero length.
Seriously, I think DFDL may need a new length kind of patternMatch where it
must positively match, where failure to match is a true failure aka parse
error.
You can simulate this by adding an dfdl:assert to the string element insisting
that its length is greater than 0.
E.g.,
<xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert>{ fn:string-length(.) gt 0 }</dfdl:assert>
</xs:appinfo> </xs:annotation>
This will force failure and therefore backtracking if the regex match length is
actually zero, which it should never be in your case.
What I think is happening here is at some point here, your match fails, which
results in zero length for the element, and then your repeating thing has zero
length, and a zero-length repeating thing, when maxOccurs="unbounded" is an
error, because it would result in an infinite loop.
As for what's causing your match to fail, I'm less sure, Just some ideas here.
Keep in mind a regex match for lengthKind pattern, those \xHH patterns are
matching character code points, not bytes. The correspondence of character code
point to byte is only 1 to 1 if you specify iso-8859-1.
I think even though your hidden group is hexBinary, there may be some daffodil
bug there. I suggest you try making the hidden group element not hidden (for
testing), and make the element a string with encoding iso-8859-1 rather than a
hexBinary.
Your regex might be simplified. Really it's just [\x00]+ I think, i.e., match
as many nulls as possible. I don't think you need the added complexity of
telling it to match reluctantly up until a non-null or end of data. I'm not
sure what that added stuff achieves.
I don't know this is your error, but a common error is to forget that ASCII is
7 bits only. So for example \xFF will never be a valid ASCII char and if that
byte 0xFF is found in the data it will cause a replacement character and that
replacement character will NOT match your regex. So the encoding really
matters. If you are using \xFF as a byte, you need iso-8859-1 encoding for sure.
I hope that all helps
Happy New Year
Mike Beckerle
Tresys Technology.
Get Outlook for Android<https://aka.ms/ghei36>
From: Costello, Roger L.
Sent: Monday, December 31, 11:30 AM
Subject: Why am I getting an "infinite loop" error message?
To: [email protected]<mailto:[email protected]>
Hello DFDL community,
I have a binary input file containing:
string null(s) string null(s) ....
Here is my input file:
[Image]
Notice that each string is followed by one or more null symbols.
One way to characterize the input is that there is a list of:
string followed by one or more nulls
The schema below is my attempt to faithfully implement that characterization.
However, when I execute the schema, I get this "infinite loop" error message:
[error] Parse Error: Repeating or Optional Element -
No forward progress at byte 47. Attempt to parse
List_of_strings succeeded but consumed no data.
Please re-examine your schema to correct this infinite loop.
I do not understand where the infinite loop is occurring. Would you explain,
please? How to fix it? /Roger
<xs:element name="input">
<xs:complexType>
<xs:sequence>
<xs:element name="List_of_strings" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="string" type="xs:string"
dfdl:lengthKind="pattern"
dfdl:lengthPattern="[\x01-\xFF]+?(?=\x00)"
dfdl:representation="text"
dfdl:encoding="ISO-8859-1"/>
<xs:sequence dfdl:hiddenGroupRef="hidden_null_Group" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:group name="hidden_null_Group">
<xs:sequence>
<xs:element name="Hidden_null" type="xs:hexBinary"
dfdl:lengthKind="pattern"
dfdl:lengthUnits="bytes"
dfdl:lengthPattern="[\x00]+?(?=([^\x00]|$))"
dfdl:outputValueCalc='{ . }' />
</xs:sequence>
</xs:group>