Steve, thank you for the detailed explanation. Much appreciated. Claude
On Tue, Mar 3, 2020 at 2:10 PM Steve Lawrence <[email protected]> wrote: > Smooks looks really interesting! Nice to hear that Daffodil is working > out well, please keep us updated! > > The issue you are seeing is caused by the way that EDI is described in > the schema (note that I'm not too familiar with the EDI format, but am > familiar with the schema). > > The EDIFACT-SupplyChain-Messages-D.03B.xsd file describes the UNA > section like this: > > <xsd:element dfdl:terminator="%WSP*;" type="srv:UNA" ... /> > > The %WSP*; is a special character class that, when parsing data, will > match zero or more whitespace characters (spaces, tabs, newlines, etc.). > So this says that the UNA can be terminated by any whitespace characters > when parsing. > > The problem with this is that when Daffodil creates an infoset from a > parse, it does not inlude what terminator was found at the end of the > data. It could have been a newline but it also could have been a space, > or nothing at all. > > So on unparsing, since the infoset doesn't contain information about > that matched terminator, Daffodil must make a decision on what value to > unparse. The rule for unparsing a WSP* is that Daffodil just unparses > the empty string, which is valid for WSP*. Although the unparsed data > might differ from the original, it is still valid EDI and semantically > the same according to the schema. > > If you only wanted to support data with newlines, and thus force > unparsing to create a newline, you could change that %WSP*; to %NL; > which is the special character class to match newlines. Or > alternatively, if you wanted to support both newlines OR any whitespace > characters, but always wanted to unparse as a newline, you could provide > two terminators, e.g.: > > dfdl:terminator="%NL;%WSP; %WSP*;" > > On parsing, that will accept either a newline followed by zero or > whitespace chars, or just zero or more whitespace characters. But will > always unparse as a newline since it appears first in the list. > > Note that something similar happens in EDIFACT-Service-Segments-4.1.xsd > for the definition of SegmentTerminator. Fortunately, it has a comment > on what to do if you want newlines to appear when unparsing, which is > similar to the above suggestion. Also, the default value for SegmentTerm > (defined in IBM_EDI_Format.xsd) is "WSP*; %NL;WSP*;", so either zero or > more whitespace characters or a newline followed by zero or more > whitespace characters. To unparse a newline, swap the order of those > (like the above terminator) so that the NL is first and will be used for > unparsing. > > - Steve > > > On 3/3/20 4:02 AM, Claude Mamo wrote: > > Hello Daffodil team, > > > > Thank you for this fantastic open source library. We've integrated > Daffodil with > > Smooks (https://github.com/smooks/smooks-dfdl-cartridge) and so far it > looks > > awesome. At the moment, we're implementing support for EDIFACT. We're > very > > close, but not quite there yet. In particular, the EDIFACT schemas, > available at > > > https://github.com/DFDLSchemas/EDIFACT/tree/daffodil-dev/src/main/resources, > > > don't seem to behave as expected when it comes to new lines. > > > > When unparsing the infoset > > > https://raw.githubusercontent.com/DFDLSchemas/EDIFACT/daffodil-dev/src/test/resources/EDIFACT-SupplyChain-D03B/TestInfosets/INVOIC_D.03B_Interchange_with_UNA.xml, > > > the new line is missing between the "UNA" and "UNB" segments: > > https://gist.github.com/claudemamo/6e381738bb1fa21fd7e14c6867380308. > I've played > > around with the expression setting the "ibmEdiFmt:SegmentTerm" variable > but > > didn't have much luck > > ( > https://github.com/claudemamo/smooks-edifact-cartridge/blob/master/schemas/src/main/resources/EDIFACT-Common/EDIFACT-Service-Segments-4.1.dfdl.xsd#L128-L139). > > > > > > > Any advice? > > > > Claude > > > >
