I do think there is a bug, it is perhaps easier to deal with than Steve
said.

First, Roger said: "Daffodil reports that there are more remaining bits:
5,376 --> 46,160" when it also reports more were consumed.

Daffodil diagnostics in the first instance state that there are "at least
5376 bits remaining" and in the second case "at least 46,160 bits
remaining". Both those statements are true if there are 500,000 bits
remaining.

What daffodil is reporting is correct when you emphasize that the messages
say "at least".

So Daffodil is not lying to you.

The bug is just that this is *completely unintuitive*, and assumes data
streaming as the use case when most users at least start with parsing whole
files or whole data buffers.

It seems like a bug to me because we're just not using Daffodil's internal
bitLimit0b when we should be. Instead we're asking the I/O layer how many
bits are left, and it is giving us what info it has, when it really can't
answer the question usefully when it's trying hard not to pull data into
memory unnecessarily.

The only way to get the right diagnostic here is to special-case file
parsing and be sure we set bitLimit0b appropriately in that case. Then the
diagnostics need to use bitLimit0b rather than asking the I/O layer how
many bits are "at least" known.

A bug perhaps related to this area did get fixed since Daffodil 3.4.0 I
think. I remember seeing "Not enough data: 72 bits required but only 1082
are available" (which is completely silly) and reporting that bug. In that
case we're taking I/O layer information and ignoring the actual Daffodil
internal bitLimit0b also, which was bound to a very small bit limit in that
specific situation, which is why I think this issue is related. I.e.,
possibly may already be fixed.

-mike beckerle







On Thu, Apr 13, 2023 at 12:37 PM Steve Lawrence <[email protected]>
wrote:

> The underlying cause is that Daffodil doesn't read all the data in at
> once. Instead it only reads and parses data in small-ish chunks and
> discards chunks once it is done with them. The benefit to this approach
> is that it allows Daffodil use smaller amounts of memory than might
> otherwise be needed if it had read the entire file in all at once. In
> fact, it even allows Daffodil to parse files that are larger than could
> fit in JVM memory.
>
> However, this means that when a parse fails, Daffodil might not have
> read the entire file, and so it doesn't actually know how much is really
> left. All it knows about is the size of the chunks that still remain.
>
> If we wanted to fix this, once parsing completes we could try to consume
>   data until we hit EOF, which would give us an accurate number of
> remaining bits. But if the input is coming from a stream, then EOF could
> take a while or not actually happen, and Daffodil would appear to hang.
> So instead we just bail and report as much as we know about.
>
> Alternatively we could check if the input is a file vs a stream and then
> do simple file size calculation, but thus far is hasn't been a high
> priority for the extra complication.
>
>
> On 2023-04-13 12:10 PM, Roger L Costello wrote:
> > I am gradually adding to my DFDL schema. I expect there to be "Left over
> data". But it would be nice if Daffodil accurately told me how much left
> over data there is. Or at least, it would be nice of Daffodil didn't
> (apparently) make things up. Let me explain.
> >
> > I ran my DFDL schema and got this message:
> >
> > [error] Left over data. Consumed 109767424 bit(s) with at least 5376
> bit(s) remaining.
> >
> > Okay, so I have about 5,000 bits remaining to be parsed.
> >
> > I added more stuff into my DFDL schema. The schema now gobbles up more
> of the input. I expect the number of bits consumed to increase and the
> number of left-over bits to decrease. Here's what Daffodil gives:
> >
> > [error] Left over data. Consumed 191712176 bit(s) with at least 46160
> bit(s) remaining.
> >
> > Daffodil reports that more bits were consumed: 109,767,424 -->
> 191,712,176
> >
> > Good. Makes sense.
> >
> > Daffodil reports that there are more remaining bits:  5,376 --> 46,160
> >
> > Huh? That's crazy.
> >
> > Why can't Daffodil accurately tell the number of remaining bits?
> >
> > /Roger
>
>

Reply via email to