Yup. I agree. I was surprised to see it not escape your NL in the fields. There are numerous tests of escape schemes that don't unparse properly documented in https://issues.apache.org/jira/browse/DAFFODIL-1556 I added discussion of your CSV case to that ticket. ________________________________ From: Costello, Roger L. <[email protected]> Sent: Monday, November 18, 2019 9:32 AM To: [email protected] <[email protected]> Subject: Re: Is there such a thing as "in-scope delimiters"?
Thanks Mike. Then that convinces me there is a bug in Daffodil. Wrapping a CSV field value in double quotes should result in both commas and newlines within the double quotes being escaped. See my post on Saturday. /Roger From: Beckerle, Mike <[email protected]> Sent: Monday, November 18, 2019 9:18 AM To: [email protected] Subject: [EXT] Re: Is there such a thing as "in-scope delimiters"? Yes. In-scope delimiters refers to delimiters, largely terminating delimiters, which are the terminators and separators that are surrounding and related to a given element or model-group in DFDL, based on DFDL's scoping rules for properties, and based on the nesting of elements and model-groups in the schema. In your schema, for "field" element, both the comma and newline are in-scope delimiters. Some people will call this the "in-scope terminating markup" but I just searched the DFDL spec and did not find the term "markup" used in this way, which is good. I've never liked referring to delimiters as "markup". One clarification perhaps: if an element has a terminator and length kind delimited, The surrounding group's separator is still considered to be in-scope and must be escaped. DFDL didn't have to be defined this way, we could have gone with a rule where a terminator is the only in-scope markup if specified, but that was not the decision. Even if an element has a terminator, the enclosing model-group's separator/terminator are still considered to be in-scope. E.g., consider this unusual example: <sequence dfdl:terminator="#" dfdl:separator="$" dfdl:separatorPosition="postfix"> <!-- we have both a terminator above, AND a postfix separator --> <element name="foo" type="xs:string" dfdl:terminator="%"/> <!-- and another terminator --> </sequence> For the "foo" element, the in-scope terminating delimiters include %, $ , and #. DFDL specifies that the "foo" element must be terminated by a "%", but the escape-scheme rules indicate that if the "foo" content contains any of %, $, or # that those characters are protected via an escape scheme. ________________________________ From: Costello, Roger L. <[email protected]<mailto:[email protected]>> Sent: Monday, November 18, 2019 6:52 AM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: Is there such a thing as "in-scope delimiters"? Hi Folks, Is there such a thing as in-scope delimiters? At the field element in the below DFDL schema, what are the in-scope delimiters? Comma and newline? Notice that the field element references a block escapeScheme, which specifies that the double quote symbol is used to escape a block of text. If a field’s value is escaped (via double quotes), then what delimiters are escaped? All in-scope delimiters – comma and newline? /Roger <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:defineEscapeScheme name='Quotes'> <dfdl:escapeScheme escapeKind='escapeBlock' escapeBlockStart='"' escapeBlockEnd='"' escapeEscapeCharacter='"' extraEscapedCharacters='' generateEscapeBlock='whenNeeded'/> </dfdl:defineEscapeScheme> <dfdl:format ref="default-dfdl-properties"/> </xs:appinfo> </xs:annotation> <xs:element name="csv"> <xs:complexType> <xs:sequence> <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix"> <xs:element name="record" maxOccurs="unbounded"> <xs:complexType> <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"> <xs:element name="field" maxOccurs="unbounded" type="xs:string" dfdl:escapeSchemeRef="Quotes" dfdl:occursCountKind="implicit"> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:sequence> </xs:complexType> </xs:element>
