Steve wrote:

> I think it would be reasonable to 
> ignore this warning.

But, but, but, ...

Mike said (paraphrasing) that it is unwise to officially publish a DFDL schema 
that produces warnings on valid data.

It appears that it is impossible to avoid getting a warning message (for the 
CSV data format where the last record of a CSV file may or may not have a 
newline) until dfdl:documentFinalTerminatorCanBeMissing="yes" is implemented. 
Do you agree?

/Roger

-----Original Message-----
From: Steve Lawrence <[email protected]> 
Sent: Sunday, November 10, 2019 9:32 AM
To: [email protected]
Subject: [EXT] Re: Is it okay to officially publish a DFDL schema that produces 
warnings on valid input data?

When unparsing a choice, we use the infoset to determine which branch of the 
choice to unparse. For example, say we had this choice:

  <xs:choice>
    <xs:element name="A" type="xs:string" ... />
    <xs:element name="B" type="xs:int" ... />
  </xs:choice>

If the infoset contained the "A" element, then we would unparse the first 
branch of the choice. If the infoset contained the "B" element, then we would 
unparse the second.

However, in this new choice you have, both branches only contain a sequence, 
which do not have a representation in the infoset. So when unparsing we don't 
know which branch to take.

That warning is trying to alert you that Daffodil will just have to pick one, 
and that it might not be the one you expected. Daffodil will currently always 
unparse the first of the ambiguous branches.

So this warning is actually normal and expected in this case. I think it would 
be reasonable to ignore this warning.


On 11/10/19 8:54 AM, Costello, Roger L. wrote:
> Mike wrote:
> 
> I suggest adding this
> 
> <choice>
> 
>    <sequence dfdl:initiator="%NL;" />
> 
>    <sequence />
> 
> </choice>
> 
> At the end of the schema after the repeating row element.
> 
> This will absorb and discard any final newline.
> 
> Oh! That is a wicked cool idea! I gave it a try. Daffodil doesn't seem to 
> like it:
> 
> [warning] Schema Definition Warning: Multiple choice branches are 
> associated with the end of element {}csv.
> 
> Note that elements with dfdl:outputValueCalc cannot be used to 
> distinguish choice branches.
> 
> Note that choice branches with entirely optional content are not allowed.
> 
> What does that message mean? How to fix it?
> 
> /Roger
> 
> *From:* Beckerle, Mike <[email protected]>
> *Sent:* Sunday, November 10, 2019 7:56 AM
> *To:* [email protected]
> *Subject:* [EXT] Re: Is it okay to officially publish a DFDL schema 
> that produces warnings on valid input data?
> 
> I would avoid this.
> 
> One thing you need to take a position on is whether on unparsing you 
> generate this final new line, or not, or try to preserve whatever the file 
> had originally.
> 
> Choosing to always generate this, or always omit it is canonicalization.
> 
> I suggest adding this
> 
> <choice>
> 
>    <sequence dfdl:initiator="%NL;" />
> 
>    <sequence />
> 
> </choice>
> 
> At the end of the schema after the repeating row element.
> 
> This will absorb and discard any final newline.
> 
> If you want to preserve the final newline then you have to model it as 
> data so change the first branch of the choice above and make it an 
> element named 'finalNewLine' with initiator and type string with explicit 
> length 0.
> 
> ----------------------------------------------------------------------
> ----------
> 
> *From:*Costello, Roger L. <[email protected] 
> <mailto:[email protected]>>
> *Sent:* Saturday, November 9, 2019 8:05:19 AM
> *To:* [email protected] <mailto:[email protected]>
> <[email protected] <mailto:[email protected]>>
> *Subject:* Is it okay to officially publish a DFDL schema that 
> produces warnings on valid input data?
> 
> Hi Folks,
> 
> Suppose you are creating the official, standard DFDL schema for a data 
> format. 
> Would you be okay with officially releasing a schema that generates 
> warnings on data that is valid?
> 
> Here's an example. The RFC for CSV (RFC 4180) says that CSV files 
> consist of records separated by newlines. Each record consists of 
> fields separated by commas. The last record may or may not have a new line.
> 
> Suppose the last record of a CSV file has newline. My DFDL schema 
> generates this
> warning:
> 
> *[warning] Left over data. Consumed 1680 bit(s) with at least 16 
> bit(s) remaining.*
> 
> I am thinking that that warning is okay. Why? Because when the last 
> record has a newline, then the file /really does/ have left over data 
> - the newline on the last record. So, a warning is not unreasonable.
> 
> Well, that's what I think. I might be thinking wrongly. What do you 
> think? Would you ever officially release a DFDL schema that generates 
> warnings on valid input data?
> 
> /Roger
> 

Reply via email to