Yup. I agree. I was surprised to see it not escape your NL in the fields.

There are numerous tests of escape schemes that don't unparse properly 
documented in
https://issues.apache.org/jira/browse/DAFFODIL-1556
I added discussion of your CSV case to that ticket.
________________________________
From: Costello, Roger L. <[email protected]>
Sent: Monday, November 18, 2019 9:32 AM
To: [email protected] <[email protected]>
Subject: Re: Is there such a thing as "in-scope delimiters"?


Thanks Mike. Then that convinces me there is a bug in Daffodil. Wrapping a CSV 
field value in double quotes should result in both commas and newlines within 
the double quotes being escaped. See my post on Saturday.



/Roger



From: Beckerle, Mike <[email protected]>
Sent: Monday, November 18, 2019 9:18 AM
To: [email protected]
Subject: [EXT] Re: Is there such a thing as "in-scope delimiters"?



Yes. In-scope delimiters refers to delimiters, largely terminating delimiters, 
which are the terminators and separators that are surrounding and related to a 
given element or model-group in DFDL, based on DFDL's scoping rules for 
properties, and based on the nesting of elements and model-groups in the schema.



In your schema, for "field" element, both the comma and newline are in-scope 
delimiters.



Some people will call this the  "in-scope terminating markup" but I just 
searched the DFDL spec and did not find the term "markup" used in this way, 
which is good. I've never liked referring to delimiters as "markup".



One clarification perhaps: if an element has a terminator and length kind 
delimited, The surrounding group's separator is still considered to be in-scope 
and must be escaped. DFDL didn't have to be defined this way, we could have 
gone with a rule where a terminator is the only in-scope markup if specified, 
but that was not the decision. Even if an element has a terminator, the 
enclosing model-group's separator/terminator are still considered to be 
in-scope.



E.g., consider this unusual example:



<sequence dfdl:terminator="#" dfdl:separator="$" 
dfdl:separatorPosition="postfix">

   <!-- we have both a terminator above, AND a postfix separator -->

   <element name="foo" type="xs:string" dfdl:terminator="%"/> <!-- and another 
terminator -->

</sequence>



For the "foo" element, the in-scope terminating delimiters include %, $ , and 
#. DFDL specifies that the "foo" element must be terminated by a "%", but the 
escape-scheme rules indicate that if the "foo" content contains any of %, $, or 
# that those characters are protected via an escape scheme.













________________________________

From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Monday, November 18, 2019 6:52 AM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Is there such a thing as "in-scope delimiters"?



Hi Folks,



Is there such a thing as in-scope delimiters?



At the field element in the below DFDL schema, what are the in-scope 
delimiters? Comma and newline?



Notice that the field element references a block escapeScheme, which specifies 
that the double quote symbol is used to escape a block of text. If a field’s 
value is escaped (via double quotes), then what delimiters are escaped? All 
in-scope delimiters – comma and newline?  /Roger



<xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/";>
        <dfdl:defineEscapeScheme name='Quotes'>
            <dfdl:escapeScheme escapeKind='escapeBlock'
                escapeBlockStart='"'
                escapeBlockEnd='"'
                escapeEscapeCharacter='"'
                extraEscapedCharacters=''
                generateEscapeBlock='whenNeeded'/>
        </dfdl:defineEscapeScheme>
        <dfdl:format ref="default-dfdl-properties"/>
    </xs:appinfo>
</xs:annotation>

<xs:element name="csv">
    <xs:complexType>
        <xs:sequence>
            <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
                <xs:element name="record" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:sequence dfdl:separator="," 
dfdl:separatorPosition="infix">
                            <xs:element name="field" maxOccurs="unbounded" 
type="xs:string"
                                dfdl:escapeSchemeRef="Quotes"
                                dfdl:occursCountKind="implicit">
                            </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:sequence>
    </xs:complexType>
</xs:element>


Reply via email to