Re: EDIFACT missing new line

Claude Mamo Wed, 04 Mar 2020 15:31:23 -0800

Steve, thank you for the detailed explanation. Much appreciated.

Claude


On Tue, Mar 3, 2020 at 2:10 PM Steve Lawrence <[email protected]> wrote:

> Smooks looks really interesting! Nice to hear that Daffodil is working
> out well, please keep us updated!
>
> The issue you are seeing is caused by the way that EDI is described in
> the schema (note that I'm not too familiar with the EDI format, but am
> familiar with the schema).
>
> The EDIFACT-SupplyChain-Messages-D.03B.xsd file describes the UNA
> section like this:
>
>   <xsd:element dfdl:terminator="%WSP*;" type="srv:UNA" ... />
>
> The %WSP*; is a special character class that, when parsing data, will
> match zero or more whitespace characters (spaces, tabs, newlines, etc.).
> So this says that the UNA can be terminated by any whitespace characters
> when parsing.
>
> The problem with this is that when Daffodil creates an infoset from a
> parse, it does not inlude what terminator was found at the end of the
> data. It could have been a newline but it also could have been a space,
> or nothing at all.
>
> So on unparsing, since the infoset doesn't contain information about
> that matched terminator, Daffodil must make a decision on what value to
> unparse. The rule for unparsing a WSP* is that Daffodil just unparses
> the empty string, which is valid for WSP*. Although the unparsed data
> might differ from the original, it is still valid EDI and semantically
> the same according to the schema.
>
> If you only wanted to support data with newlines, and thus force
> unparsing to create a newline, you could change that %WSP*; to %NL;
> which is the special character class to match newlines. Or
> alternatively, if you wanted to support both newlines OR any whitespace
> characters, but always wanted to unparse as a newline, you could provide
> two terminators, e.g.:
>
>   dfdl:terminator="%NL;%WSP; %WSP*;"
>
> On parsing, that will accept either a newline followed by zero or
> whitespace chars, or just zero or more whitespace characters. But will
> always unparse as a newline since it appears first in the list.
>
> Note that something similar happens in EDIFACT-Service-Segments-4.1.xsd
> for the definition of SegmentTerminator. Fortunately, it has a comment
> on what to do if you want newlines to appear when unparsing, which is
> similar to the above suggestion. Also, the default value for SegmentTerm
> (defined in IBM_EDI_Format.xsd) is "WSP*; %NL;WSP*;", so either zero or
> more whitespace characters or a newline followed by zero or more
> whitespace characters. To unparse a newline, swap the order of those
> (like the above terminator) so that the NL is first and will be used for
> unparsing.
>
> - Steve
>
>
> On 3/3/20 4:02 AM, Claude Mamo wrote:
> > Hello Daffodil team,
> >
> > Thank you for this fantastic open source library. We've integrated
> Daffodil with
> > Smooks (https://github.com/smooks/smooks-dfdl-cartridge) and so far it
> looks
> > awesome. At the moment, we're implementing support for EDIFACT. We're
> very
> > close, but not quite there yet. In particular, the EDIFACT schemas,
> available at
> >
> https://github.com/DFDLSchemas/EDIFACT/tree/daffodil-dev/src/main/resources,
>
> > don't seem to behave as expected when it comes to new lines.
> >
> > When unparsing the infoset
> >
> https://raw.githubusercontent.com/DFDLSchemas/EDIFACT/daffodil-dev/src/test/resources/EDIFACT-SupplyChain-D03B/TestInfosets/INVOIC_D.03B_Interchange_with_UNA.xml,
>
> > the new line is missing between the "UNA" and "UNB" segments:
> > https://gist.github.com/claudemamo/6e381738bb1fa21fd7e14c6867380308.
> I've played
> > around with the expression setting the "ibmEdiFmt:SegmentTerm" variable
> but
> > didn't have much luck
> > (
> https://github.com/claudemamo/smooks-edifact-cartridge/blob/master/schemas/src/main/resources/EDIFACT-Common/EDIFACT-Service-Segments-4.1.dfdl.xsd#L128-L139).
>
> >
> >
> > Any advice?
> >
> > Claude
> >
>
>

Re: EDIFACT missing new line

Reply via email to