Re: Any difference between specifying a nil value as part of a regex versus using all the DFDL nil properties?

Mike Beckerle Tue, 02 Aug 2022 11:36:42 -0700

The seeming equivalence here comes from considering primarily the Cyberian
(aka Cyber-security) use case of parse and then unparse.

In that case, it's hard to see any important preference here. If Cyberia is
the only use case for the schema, then perhaps expediency, and the smallest
number of lines of schema is best.

But if you are trying to describe the format fully, so as to enable all
sorts of applications using that format, then yes there start to be
distinctions that suggest using nillable elements.

Using nillable elements can hide the representation issues around nilled
elements from the application creating the data. Instead of having to know
to use "-" for a particular element, one just uses the standard way of
making an XML element nilled <Foo xsi:nil='true'/>.  As an example, suppose
one part of the format used "-" to indicate nil, and other places in the
same format they use "N/A" to indicate nil. (This kind of inconsistency
happens a lot.)  That representation issue can be hidden from the
application by using nillable elements for both cases, they just have
different DFDL nil values.

Consider if an application starts by unparsing the data. I.e., it is
creating the data. Then the dfdl:lengthKind 'pattern' does basically
nothing for you. The patterns aren't used when unparsing.
So data like <Foo>quijibo</Foo> will happily unparse. If you don't want
that to be allowed, you want to define the DFDL schema so that this XML is
invalid. I.e., use pattern facets or enumeration facets to allow only ABC,
DEF, GHI as values. Then the element can only be nilled, or one of the
legal valid values, and this holds at the XML Infoset level. I.e., you can
point at the XML and say that the XML is valid/invalid, without reference
to any DFDL properties. An XSD-aware XML editor will flag these
invalidities for you.

On Tue, Aug 2, 2022 at 1:16 PM Roger L Costello <[email protected]> wrote:

> Hi Folks,
>
> I have a data format with a field whose value is one of these strings: ABC
> or DEF or GHI.
>
> If no data is available to populate the field, the field must contain a
> single hyphen.
>
> I am using a regex to specify the field's content.
>
> Question: What's the difference between specifying the field this way:
>
> <xs:element name="Foo"
>         dfdl:lengthKind="pattern"
>         dfdl:lengthPattern="(ABC|DEF|GHI)|-" ...
>
> Notice that all properties dealing with nil are omitted. The nil value is
> part of the regex.
>
> versus specifying the field this way:
>
> <xs:element name="Foo"
>         nillable="true"
>         dfdl:nilKind="literalValue"
>         dfdl:nilValue="-"
>         dfdl:lengthKind="pattern"
>         dfdl:lengthPattern="ABC|DEF|GHI"
>         ...
>
> Notice the regex doesn't specify a hyphen and the nil value is specified
> using the nil properties.
>
> Do the two ways mean the same thing? I am guessing that technically they
> do not mean the same thing; but in practice they both result in
> constraining the field to the same set of allowable values, and therefore
> may be considered equivalent. Do you agree?
>
> Do the two ways behave the same for parsing and unparsing?
>
> If they are equivalent, which way is preferred?
>
> /Roger
>

Re: Any difference between specifying a nil value as part of a regex versus using all the DFDL nil properties?

Reply via email to