The generateEscapeBlock property only applies when
escapeKind="escapeBlock" and only on unparsing. That explains why you
don't see any different when escapeKind="escapeCharacter".

As an example, let's say we have the following:

  escapeKind="escapeBlock"
  escapeBlockStart="&quote;"
  escapeBlockEnd="&quote;"

And assume there's one in-scope delimiter that is:

  separator=","

This defines a format similar to CSV, where each field is separated by a
comma, and fields the contain commas must start and end with quotes to
signify that those commas are part of the data and not a delimiter. Note
that data can still include quotes even if the data doesn't have a
comma. So your data could look like this:

   foo,"bar",baz,"qaz,maz"

So 'foo' and 'baz' are unquoted, 'bar' is quoted unnecessarily since it
does not contain a comma, and 'qaz,maz' is quoted and it's required
since the data contains a comma.

In this case this might unparse to something like this:

   <field>foo</field>
   <field>bar</field>
   <field>baz</field>
   <field>qaz,maz</field>

Note that the escape block quotes have all been stripped off leaving
only the data, and the last field contains a comma in the data.

Now, let's assume we want to unparse the XML. This is where
generateEscapeBlock plays a role. Since the escape block characters are
not in the infoset, we need a way to determine when we should create the
escape block quote characters.

One option is generateEscapeBlock="always", which means every field will
be unparsed with the escape block characters, regardless if they are
needed or not. So the above would become this:

  "foo","bar","baz","qaz,maz"

Every field now has escape block quotes, even though they aren't all
necessary.

The other option is generateEscapeBlock="whenNeeded". In this case,
Daffodil inspects each field before unparsing and determines if it
contains any in-scope delimiters. If it does, only then will it add the
escape block quotes. With "whenNeeded", the data unparses to this:

  foo,bar,baz,"qaz,maz"

Note that only "qaz,maz" has quotes because only its field contains an
inscope delimiter (the comma separator). Also note that "bar" does not
have quotes even though the original data did have quotes. This is
because the quotes are not necessary and the infoset does not store
whether or not a field originally had quotes or not.

- Steve



On 11/20/18 10:58 AM, Costello, Roger L. wrote:
> Hello DFDL community,
> 
> I have an input file that uses a colon to separate (delimit) fields.
> 
> The backslash symbol is used to escape the colon.
> 
> Below is how I define the escape symbol. Everything makes sense to me except 
> generateEscapeBlock. I get the same behavior regardless of whether I use 
> generateEscapeBlock="whenNeeded" or generateEscapeBlock="always". I read the 
> DFDL specification description of generateEscapeBlock. Honestly, that didn’t 
> help in my understanding of the difference between whenNeeded and always. 
> Would 
> someone please explain the differences in simple, layman terms, please? When 
> do 
> I use one versus the other? When would I see a difference in behavior?  /Roger
> 
> <dfdl:defineEscapeSchemename="Backslash">
>             <dfdl:escapeScheme
>                          escapeKind="escapeCharacter"
>                          escapeCharacter="\"
>                          escapeEscapeCharacter="\"
>                          extraEscapedCharacters=""
>                          generateEscapeBlock="whenNeeded"
> />
> </dfdl:defineEscapeScheme>
> 

Reply via email to