If you can send your entire schema I can reproduce this, and given that we're 
exploring variations on the theme, that would help there also.

>From what I see, I agree that this array should not be ending at those 
>adjacent commas, but should be constructing an empty-string element value.

There are some more properties that could be playing a role here:

One is dfdl:separatorSuppressionPolicy.

I am curious if you have this as "anyEmpty" or "trailingEmpty" or 
"trailingEmptyStrict".  I am not sure this should matter for parsing your 
example however.

Another property is this new dfdlx:emptyElementParsePolicy. In Daffodil, this 
defaults to "treatAsEmpty" for the time being. It can be set to treatAsAbsent 
in case people really hate empty elements and want those always treated as 
absent.


Property Name

Description

emptyElementParsePolicy

Enum

Valid values are "treatAsAbsent" or "treatAsEmpty"

This property describes the behavior of the DFDL processor for occurrences of 
elements of any type that have the empty representation.


When 'treatAsEmpty' if an occurrence of an element has the empty representation 
when parsed, the behaviour is as stated in section 9 for an occurrence with 
empty representation. Consequently, default values or empty strings may be 
added to the infoset.

When 'treatAsAbsent' if an occurrence of an element has the empty 
representation when parsed, the behaviour is as stated in section 9 for an 
absent occurrence. Consequently, default values or empty strings are never 
added to the infoset.

Annotation: dfdl:element, dfdl:simpleType

________________________________
From: Costello, Roger L. <[email protected]>
Sent: Tuesday, February 18, 2020 3:05 PM
To: [email protected] <[email protected]>
Subject: Re: Need an example of using emptyValueDelimiterPolicy


Thank you Mike. That is very helpful.



I made a slight modification to your example: now the input is a series of 
comma-separated names. To prohibit consecutive commas we wrap each name in 
parenthesis and specify emptyValueDelimiterPolicy=both



<xs:sequence dfdl:separator="," dfdl:separatorPosition="infix">
    <xs:element name="name" type="xs:string" maxOccurs="unbounded"
        dfdl:initiator="(" dfdl:terminator=")"
        dfdl:emptyValueDelimiterPolicy="both" />
</xs:sequence>



So, this is how the input would look when there is no value for the second name:



(John),(),(Bill),(Linda)



That works great.



Next, suppose that when there is no value for a name, we don’t want the 
initiator or terminator (consecutive commas are okay, we decide). We would 
specify emptyValueDelimiterPolicy=none, right? The input should look like this:



(John),,(Bill),(Linda)



Right?



I tried that but got this message:



[warning] Left over data. Consumed 48 bit(s) with at least 128 bit(s) remaining.



This is the output I got:



<input>
  <name>John</name>
</input>



Why is this happening? What happened to the other names?



/Roger

From: Beckerle, Mike <[email protected]>
Sent: Tuesday, February 18, 2020 2:40 PM
To: [email protected]
Subject: [EXT] Re: Need an example of using emptyValueDelimiterPolicy



emptyValueDelimiterPolicy is certainly a squirrelly area of DFDL and daffodil.



Made more complicated by the fact that default values aren't fully implemented 
in either daffodil or IBM DFDL.



What you've expressed thusfar doesn't motivate any need for 
emptyValueDelimiterPolicy.



Your element is an integer and has no default value. So there is nothing to 
create if an "empty" syntax (which would be "()" for your case) is detected. 
Hence, empty isn't allowed, and the message about the emptyValueDelimiterPolicy 
being ignored.



Furthermore, your element has minOccurs 0, so it is "optional" and so no 
defaulting would ever be done anyway. Instead nothing would be added to the 
infoset on parsing. But that's only applicable if "empty" is even a concept for 
your element type. In the case of an integer, it either needs to find "(8)" 
with some value like 8, or it needs to find nothing at all (the next separator 
perhaps). Finding "()" should cause an error that the integer can't be parsed 
from empty string.



For emptyValueDelimiterPolicy to be useful on a numeric type, the element must 
be required (so scalar element or minOccurs >= 1 with appropriate 
occursCountKind), and must have a default value or be nillable and have 
dfdl:useNilAsDefault="true" which makes being nilled the default value.



daffodil support for default values is only partial, also. I am not sure the 
above such as making your integer nillable would not also result in diagnostic 
messages about things being not supported.



The type xs:string however, is fully supported, because well, empty strings are 
a legitimate value for strings. So you can use emptyValueDelimiterPolicy to 
control whether for example you want explicit indications that the string value 
is to be empty string, or not.



E.g., suppose you have a format which is comma separated and each of the 4 
elements which are just scalars, are a choice of either an integer or a string.



For example here's some data



1,2,foo,bar



what if we want the strings to be allowed to be empty strings, but we don't 
want this allowed:



1,2,,bar



because we consider those two adjacent commas to be an evil confusing thing. We 
ant you to have to put something in the field.



So we can instead require the strings to have initiators and terminators so:



1,2,(foo),(bar),



but... depending on emptyValueDelimiterPolicy the evil adjacent commas might 
still be allowed. If we want to disallow such evil, we must choose 
emptyValueDelimiterPolicy='both'



so that



1,2,(),(bar)



is what is required to get an empty string value for the 3rd element.



Not sure that helps, but this is the sort of thing emptyValueDelimiterPolicy is 
for.











________________________________

From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Tuesday, February 18, 2020 12:17 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Need an example of using emptyValueDelimiterPolicy



Hi Folks,

Suppose the input is an integer that is initiated by a left parenthesis and 
terminated by a right parenthesis, e.g.,

(44)

I thought that I would use emptyValueDelimiterPolicy for that input, using this 
schema:

<xs:sequence dfdl:initiator="(" dfdl:terminator=")" >
    <xs:element  name="num"
                 type="xs:integer"
                 minOccurs="0"
                 dfdl:emptyValueDelimiterPolicy="both" />
</xs:sequence>

Question #1: Is that a legitimate scenario for using emptyValueDelimiterPolicy?

Question #2: Does Daffodil support emptyValueDelimiterPolicy? This message 
seems to suggest that Daffodil does not support it:

[warning] Schema Definition Warning: DFDL property was ignored: 
emptyValueDelimiterPolicy="both"

/Roger

Reply via email to