The regex "[A-Z]{2,20}" says to match between 2 and 20 characters where
only A-Z characters are allowed. Using this regex, Daffodil will scan
the data and stop at the colon character since it does not match A-Z. So
the length of the Identifier element according to the regex is 4 (the
length of "TYPE").
Since the value of the Identifer is "TYPE" is does not fail the nilld or
empty string assertion, and so there is no parse error and the first
choice branch is successful. Because there are no more elements to
parse, the remaining data (i.e. the colon and TEL) are not parsed and
are considered left over data.
When Description is moved to the first branch of the choice, it
successfully parses the "TYPE:" initiator, and then the regex matches
everything after that (i.e. TEL) and it works as expected.
On 5/11/22 7:59 AM, Roger L Costello wrote:
Another thing that cause the dreaded left over data error message.
I have input containing this field:
TYPE:TEL
That is, the field is initiated by TYPE:
The field has a choice of values: either a string of 2-20 uppercase letters, or
a string 1-56 uppercase letters initiated by TYPE:
Here’s the DFDL schema I used
<xs:choice dfdl:choiceLengthKind="implicit">
<xs:element name="Identifier" type="non-zero-length-string"
dfdl:lengthPattern="[A-Z]{2,20}"/>
<xs:element name="Description" type="non-zero-length-string"
dfdl:lengthPattern="[A-Z]{1,56}" dfdl:initiator="TYPE:"/>
</xs:choice>
With that choice and the above input, Daffodil doesn’t process the field and
reports left over data. As best I can tell, Daffodil uses the first branch of
the choice, notices that the regex doesn’t contain a colon, and then gives up. I
think.
If I reverse the element declarations, then Daffodil successfully processes the
input.
I guess that I really don’t understand why one works while the other doesn’t.
Would you explain why Daffodil reports left over data with the first but not the
second, please?
For completeness, here is the simpleType:
<xs:simpleType name="non-zero-length-string" dfdl:lengthKind="pattern">
<xs:annotation>
<xs:appinfo source=http://www.ogf.org/dfdl/
<http://www.ogf.org/dfdl/>>
<dfdl:assert test="{ fn:nilled(.) or . ne '' }"/>
</xs:appinfo>
</xs:annotation>
<xs:restriction base="xs:string"/>
</xs:simpleType>
/Roger
*From:* Mike Beckerle <[email protected]>
*Sent:* Tuesday, May 3, 2022 6:32 PM
*To:* [email protected]
*Subject:* [EXT] Re: Catalog the causes of the dreaded “left over data” error
message
Here is a trick used in one schema I've seen:
<*xs**:group *name*="requireNoDataLeft"* >
<*xs**:sequence* >
<*xs**:element *name*="data" *type*="tns:tIntField" *dfdl:length*="1"
*minOccurs*="0"*/>
<*xs**:sequence* >
<*xs**:annotation* >
<*xs**:appinfo *source*="http://www.ogf.org/dfdl/
<http://www.ogf.org/dfdl/>"* >
<*dfdl**:assert *test*="{ fn:not(fn:exists(data)) }" *message*="Data found
where none was expected." */>
</*xs**:appinfo* >
</*xs**:annotation* >
</*xs**:sequence* >
</*xs**:sequence* >
</*xs**:group* >
So a group reference to "requireNoDataLeft" states "There cannot be any more
data available."
This mostly is for the case where there is a surrounding "box" of data such as
an element with lengthKind 'explicit' and you expect the described contents to
use up everything in that box.
So if your first choice branch ends with a group ref to "requireNoDataLeft" then
it must consume all available data, and will fail (and backtrack the choice to
the next one) if there is data available after it.
On Tue, May 3, 2022 at 1:52 PM Roger L Costello <[email protected]
<mailto:[email protected]>> wrote:
The “left over data” error occurs when there is a choice where the first
branch matches the same data as the second branch and the second branch
matches a bit more. Input data that matches the second branch fails because
the first branch parses the input and then stops and reports left over
data.
See example below.
Is there a workaround? (without manually shuffling the order of the
branches
in the choice)
<xs:choice>
<xs:element name="MilitaryDayTime">
<xs:complexType>
<xs:sequence dfdl:separator="">
<xs:element name="Day" type="non-zero-length-string"
dfdl:lengthPattern="[0-9]{2}"/>
<xs:element name="HourTime" type="non-zero-length-string"
dfdl:lengthPattern="[0-9]{2}"/>
<xs:element name="MinuteTime"
type="non-zero-length-string"
dfdl:lengthPattern="[0-9]{2}"/>
<xs:element name="TimeZone" type="non-zero-length-string"
dfdl:lengthPattern="..."/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="DateTimeGroup">
<xs:complexType>
<xs:sequence dfdl:separator="">
<xs:element name="Day" type="non-zero-length-string"
dfdl:lengthPattern="[0-9]{2}"/>
<xs:element name="HourTime" type="non-zero-length-string"
dfdl:lengthPattern="[0-9]{2}"/>
<xs:element name="MinuteTime"
type="non-zero-length-string"
dfdl:lengthPattern="[0-9]{2}"/>
<xs:element name="TimeZone" type="non-zero-length-string"
dfdl:lengthPattern="..."/>
<xs:element name="MonthName" type="non-zero-length-string"
dfdl:lengthPattern="…"/>
<xs:element name="Year" type="non-zero-length-string"
dfdl:lengthPattern="[0-9]{4}"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
*From:* Mike Beckerle <[email protected] <mailto:[email protected]>>
*Sent:* Monday, May 2, 2022 10:02 AM
*To:* [email protected] <mailto:[email protected]>
*Subject:* [EXT] Re: Catalog the causes of the dreaded “left over data”
error message
I first encountered left-over-data with a dead-simple file format. Just a
top level element named "records" with a minOccurs="0"
maxOccurs="unbounded"
array of elements named "record".
Due to minOccurs="0" such a schema is very happy to "successfully" parse
zero records, and tell you the entire file contents are "left over data".
I learned one often wants to have minOccurs="1" to force it to at least be
successful on one record.
On Fri, Apr 15, 2022 at 9:48 AM Roger L Costello <[email protected]
<mailto:[email protected]>> wrote:
Hi Folks,
Have you encountered the “left over data” error message? If you’ve
worked with Daffodil for more than 5 minutes, you undoubtedly have.
The problem with that error message is it gives you absolutely no clue
what’s causing the problem.
Perhaps if we start cataloging the things that triggered the error
message, then the Daffodil team will be able to provide better
diagnostics. Here’s my contribution to said catalog.
-----------------------
In recent weeks I have encountered the dreaded “left over data” error
message twice. After enormous effort I was able to figure out what the
problems were in my DFDL schema. First I need to describe my DFDL
schema.
My DFDL schema consists of a series of element declarations and within
each element are declarations of subelements:
A
A.1
A.2
…
B
B.1
B.2
…
…
Each subelement is of type string and uses a regex to describe the
subelement’s data (i.e., the subelements use dfdl:lengthKind=”pattern”
and dfdl:lengthPattern=”regex”)
The first time that I got the “left over data” error message I found
the
cause was due to this bug in my DFDL schema: a dfdl:lengthPattern
listed
the regex alternatives in the wrong order (shortest to longest instead
of longest to shortest). The error message said that Daffodil stopped
consuming input at element G. The actual element containing the regex
in
wrong order was element G.2 (Daffodil stopped consuming input pretty
near the problem)
After I fixed that bug I immediately got another “left over data” error
at element J. After much more effort I found the bug: a regex
erroneously had spaces in it. In this case, the error message said that
Daffodil stopped consuming input at element J. The actual element
containing the regex with spaces was element K.5 (Daffodil stopped
consuming input pretty far from the problem)
/Roger