Wow!

That is a fantastically clear explanation.

Thank you Steve!

/Roger

-----Original Message-----
From: Steve Lawrence <[email protected]> 
Sent: Tuesday, November 20, 2018 11:30 AM
To: [email protected]; Costello, Roger L. <[email protected]>
Subject: Re: I don't understand the difference between 
generateEscapeBlock="whenNeeded" and generateEscapeBlock="always"

The generateEscapeBlock property only applies when escapeKind="escapeBlock" and 
only on unparsing. That explains why you don't see any different when 
escapeKind="escapeCharacter".

As an example, let's say we have the following:

  escapeKind="escapeBlock"
  escapeBlockStart="&quote;"
  escapeBlockEnd="&quote;"

And assume there's one in-scope delimiter that is:

  separator=","

This defines a format similar to CSV, where each field is separated by a comma, 
and fields the contain commas must start and end with quotes to signify that 
those commas are part of the data and not a delimiter. Note that data can still 
include quotes even if the data doesn't have a comma. So your data could look 
like this:

   foo,"bar",baz,"qaz,maz"

So 'foo' and 'baz' are unquoted, 'bar' is quoted unnecessarily since it does 
not contain a comma, and 'qaz,maz' is quoted and it's required since the data 
contains a comma.

In this case this might unparse to something like this:

   <field>foo</field>
   <field>bar</field>
   <field>baz</field>
   <field>qaz,maz</field>

Note that the escape block quotes have all been stripped off leaving only the 
data, and the last field contains a comma in the data.

Now, let's assume we want to unparse the XML. This is where generateEscapeBlock 
plays a role. Since the escape block characters are not in the infoset, we need 
a way to determine when we should create the escape block quote characters.

One option is generateEscapeBlock="always", which means every field will be 
unparsed with the escape block characters, regardless if they are needed or 
not. So the above would become this:

  "foo","bar","baz","qaz,maz"

Every field now has escape block quotes, even though they aren't all necessary.

The other option is generateEscapeBlock="whenNeeded". In this case, Daffodil 
inspects each field before unparsing and determines if it contains any in-scope 
delimiters. If it does, only then will it add the escape block quotes. With 
"whenNeeded", the data unparses to this:

  foo,bar,baz,"qaz,maz"

Note that only "qaz,maz" has quotes because only its field contains an inscope 
delimiter (the comma separator). Also note that "bar" does not have quotes even 
though the original data did have quotes. This is because the quotes are not 
necessary and the infoset does not store whether or not a field originally had 
quotes or not.

- Steve



On 11/20/18 10:58 AM, Costello, Roger L. wrote:
> Hello DFDL community,
> 
> I have an input file that uses a colon to separate (delimit) fields.
> 
> The backslash symbol is used to escape the colon.
> 
> Below is how I define the escape symbol. Everything makes sense to me 
> except generateEscapeBlock. I get the same behavior regardless of 
> whether I use generateEscapeBlock="whenNeeded" or 
> generateEscapeBlock="always". I read the DFDL specification 
> description of generateEscapeBlock. Honestly, that didn't help in my 
> understanding of the difference between whenNeeded and always. Would 
> someone please explain the differences in simple, layman terms, 
> please? When do I use one versus the other? When would I see a 
> difference in behavior?  /Roger
> 
> <dfdl:defineEscapeSchemename="Backslash">
>             <dfdl:escapeScheme
>                          escapeKind="escapeCharacter"
>                          escapeCharacter="\"
>                          escapeEscapeCharacter="\"
>                          extraEscapedCharacters=""
>                          generateEscapeBlock="whenNeeded"
> />
> </dfdl:defineEscapeScheme>
> 

Reply via email to