Smooks looks really interesting! Nice to hear that Daffodil is working out well, please keep us updated!
The issue you are seeing is caused by the way that EDI is described in the schema (note that I'm not too familiar with the EDI format, but am familiar with the schema). The EDIFACT-SupplyChain-Messages-D.03B.xsd file describes the UNA section like this: <xsd:element dfdl:terminator="%WSP*;" type="srv:UNA" ... /> The %WSP*; is a special character class that, when parsing data, will match zero or more whitespace characters (spaces, tabs, newlines, etc.). So this says that the UNA can be terminated by any whitespace characters when parsing. The problem with this is that when Daffodil creates an infoset from a parse, it does not inlude what terminator was found at the end of the data. It could have been a newline but it also could have been a space, or nothing at all. So on unparsing, since the infoset doesn't contain information about that matched terminator, Daffodil must make a decision on what value to unparse. The rule for unparsing a WSP* is that Daffodil just unparses the empty string, which is valid for WSP*. Although the unparsed data might differ from the original, it is still valid EDI and semantically the same according to the schema. If you only wanted to support data with newlines, and thus force unparsing to create a newline, you could change that %WSP*; to %NL; which is the special character class to match newlines. Or alternatively, if you wanted to support both newlines OR any whitespace characters, but always wanted to unparse as a newline, you could provide two terminators, e.g.: dfdl:terminator="%NL;%WSP; %WSP*;" On parsing, that will accept either a newline followed by zero or whitespace chars, or just zero or more whitespace characters. But will always unparse as a newline since it appears first in the list. Note that something similar happens in EDIFACT-Service-Segments-4.1.xsd for the definition of SegmentTerminator. Fortunately, it has a comment on what to do if you want newlines to appear when unparsing, which is similar to the above suggestion. Also, the default value for SegmentTerm (defined in IBM_EDI_Format.xsd) is "WSP*; %NL;WSP*;", so either zero or more whitespace characters or a newline followed by zero or more whitespace characters. To unparse a newline, swap the order of those (like the above terminator) so that the NL is first and will be used for unparsing. - Steve On 3/3/20 4:02 AM, Claude Mamo wrote: > Hello Daffodil team, > > Thank you for this fantastic open source library. We've integrated Daffodil > with > Smooks (https://github.com/smooks/smooks-dfdl-cartridge) and so far it looks > awesome. At the moment, we're implementing support for EDIFACT. We're very > close, but not quite there yet. In particular, the EDIFACT schemas, available > at > https://github.com/DFDLSchemas/EDIFACT/tree/daffodil-dev/src/main/resources, > don't seem to behave as expected when it comes to new lines. > > When unparsing the infoset > https://raw.githubusercontent.com/DFDLSchemas/EDIFACT/daffodil-dev/src/test/resources/EDIFACT-SupplyChain-D03B/TestInfosets/INVOIC_D.03B_Interchange_with_UNA.xml, > > the new line is missing between the "UNA" and "UNB" segments: > https://gist.github.com/claudemamo/6e381738bb1fa21fd7e14c6867380308. I've > played > around with the expression setting the "ibmEdiFmt:SegmentTerm" variable but > didn't have much luck > (https://github.com/claudemamo/smooks-edifact-cartridge/blob/master/schemas/src/main/resources/EDIFACT-Common/EDIFACT-Service-Segments-4.1.dfdl.xsd#L128-L139). > > > > Any advice? > > Claude >
