Re: EDIFACT missing new line

Steve Lawrence Tue, 03 Mar 2020 05:10:53 -0800

Smooks looks really interesting! Nice to hear that Daffodil is working
out well, please keep us updated!

The issue you are seeing is caused by the way that EDI is described in
the schema (note that I'm not too familiar with the EDI format, but am
familiar with the schema).

The EDIFACT-SupplyChain-Messages-D.03B.xsd file describes the UNA
section like this:

  <xsd:element dfdl:terminator="%WSP*;" type="srv:UNA" ... />

The %WSP*; is a special character class that, when parsing data, will
match zero or more whitespace characters (spaces, tabs, newlines, etc.).
So this says that the UNA can be terminated by any whitespace characters
when parsing.

The problem with this is that when Daffodil creates an infoset from a
parse, it does not inlude what terminator was found at the end of the
data. It could have been a newline but it also could have been a space,
or nothing at all.

So on unparsing, since the infoset doesn't contain information about
that matched terminator, Daffodil must make a decision on what value to
unparse. The rule for unparsing a WSP* is that Daffodil just unparses
the empty string, which is valid for WSP*. Although the unparsed data
might differ from the original, it is still valid EDI and semantically
the same according to the schema.

If you only wanted to support data with newlines, and thus force
unparsing to create a newline, you could change that %WSP*; to %NL;
which is the special character class to match newlines. Or
alternatively, if you wanted to support both newlines OR any whitespace
characters, but always wanted to unparse as a newline, you could provide
two terminators, e.g.:

  dfdl:terminator="%NL;%WSP; %WSP*;"

On parsing, that will accept either a newline followed by zero or
whitespace chars, or just zero or more whitespace characters. But will
always unparse as a newline since it appears first in the list.

Note that something similar happens in EDIFACT-Service-Segments-4.1.xsd
for the definition of SegmentTerminator. Fortunately, it has a comment
on what to do if you want newlines to appear when unparsing, which is
similar to the above suggestion. Also, the default value for SegmentTerm
(defined in IBM_EDI_Format.xsd) is "WSP*; %NL;WSP*;", so either zero or
more whitespace characters or a newline followed by zero or more
whitespace characters. To unparse a newline, swap the order of those
(like the above terminator) so that the NL is first and will be used for
unparsing.

- Steve

On 3/3/20 4:02 AM, Claude Mamo wrote:
> Hello Daffodil team,
> 
> Thank you for this fantastic open source library. We've integrated Daffodil 
> with 
> Smooks (https://github.com/smooks/smooks-dfdl-cartridge) and so far it looks 
> awesome. At the moment, we're implementing support for EDIFACT. We're very 
> close, but not quite there yet. In particular, the EDIFACT schemas, available 
> at 
> https://github.com/DFDLSchemas/EDIFACT/tree/daffodil-dev/src/main/resources, 
> don't seem to behave as expected when it comes to new lines.
> 
> When unparsing the infoset 
> https://raw.githubusercontent.com/DFDLSchemas/EDIFACT/daffodil-dev/src/test/resources/EDIFACT-SupplyChain-D03B/TestInfosets/INVOIC_D.03B_Interchange_with_UNA.xml,
>  
> the new line is missing between the "UNA" and "UNB" segments: 
> https://gist.github.com/claudemamo/6e381738bb1fa21fd7e14c6867380308. I've 
> played 
> around with the expression setting the "ibmEdiFmt:SegmentTerm" variable but 
> didn't have much luck 
> (https://github.com/claudemamo/smooks-edifact-cartridge/blob/master/schemas/src/main/resources/EDIFACT-Common/EDIFACT-Service-Segments-4.1.dfdl.xsd#L128-L139).
>  
> 
> 
> Any advice?
> 
> Claude
>

Re: EDIFACT missing new line

Reply via email to