Mike Beckerle wrote:
* Design the DFDL schema to reject malformed data, not just accept correct
data.
Oh, oh, yea!
I like it!
Not sure how to do that, however. Would you help me work through this, please?
Mike points out, with this input:
* Foobar
* OPER/something not allowed//
* Barfoo
*
* Parsing the OPER line will fail, but then it will try parsing it as an
EXER line, which will also fail, so it will leave the whole wrapper element
out, and it will continue to try to parse the OPER line instead of failing.
Is this the behavior we desire:
If an input line starts (is initiated by) OPER, then process the rest of the
input line using the DFDL description of OPER. If, during the processing of the
OPER field, an error arises, then the parser should display an error message,
abandon the input line, proceed to the next input line and the element
following the wrapper element.
Is that the behavior we desire?
Mike said that the solution is to:
* Use dfdl:discriminator with testKind='pattern'
I don’t think that I’ve ever used that combination, so I did some experimenting.
Suppose the legal value for the field following EXER is TANGO (all uppercase)
and the legal value for the field following OPER is XRAY (all uppercase).
Is this how to declare the wrapper element:
<xs:element name="OPER-EXER-wrapper" minOccurs="0">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:discriminator testKind="pattern"
testPattern="(OPER/XRAY)|(EXER/TANGO)|"/>
</xs:appinfo>
</xs:annotation>
<xs:complexType>
<!-- OPER and EXER declarations -->
</xs:complexType>
</xs:element>
Is that correct?
This is great stuff. Once I grok this, my IQ will have increased another 10
points.
/Roger
From: Mike Beckerle <[email protected]>
Sent: Tuesday, September 26, 2023 4:49 PM
To: [email protected]
Subject: [EXT] Re: DFDL can increase your IQ by 10 points!
ZjQcmQR
YFThere is another detail which will further improve your schema.
What if the data contains an OPER line, but after the OPER characters there is
some defect in the data of the OPER line.
foobar
OPER/something not allowed
barfoo
Parsing the OPER line will fail, but then it will try parsing it as an EXER
line, which will also fail, so it will leave the whole wrapper element out, and
it will continue to try to parse the OPER line instead of failing. Your
optional element gave it a way to suppress the error and parse differently.
If the schema after this OPER/EXPR element is say, just a string, then
"OPER/something not allowed" will be taken as the value of that string, and ...
it's possible the parse will succeed and just produce an infoset that is
perfectly valid according to the schema, but clearly the schema is allowing a
solution we want to disallow.
The fix here is your optionality needs a discriminator. The discriminator on
the optional element you need checks that the data starts with OPER or EXPR
only.
(use dfdl:discriminator with testKind='pattern').
This issue is a matter of precision. It's the difference between:
1. It's either a fully correct OPER line, or a fully correct EXER line, or
it isn't present.
2. It's either a line that starts with OPER or a line that starts with EXER
or it isn't present.
That distinction is designing the schema to properly reject malformed data, not
just accept correct data.
See in (1) above, it allows for faulty OPER or EXPR lines to be correctly
parsed as "it isn't present". The decision really should NOT depend on any more
than the OPER or EXPR characters being there.
I find it hard to remember to do this. But most decisions in the schema need
discriminators. I have to revisit every decision point in the schema one by one
to make sure there are discriminators everywhere there can be.
On Tue, Sep 26, 2023 at 10:13 AM Roger L Costello
<[email protected]<mailto:[email protected]>> wrote:
Hi Folks,
I think DFDL is awesome. Think about it: DFDL is a standard language for
describing (describe, not parse) just about any data format. Again, I emphasize
that it's not about how to parse the data format, it's about describing the
data format. Given a description a DFDL processor can figure out how to parse
instances of the data format. Wow!
But there's another reason that DFDL is awesome: it forces you to be very
precise in your description. It forces you to think very logically. It forces
you to understand the implications of your description decisions. Let me give
you an example of the later.
I am dealing with a data format that consists of a sequence of lines. Here's a
sample instance:
John Doe
OPER/XRAY//
Sally Smith
The first and last lines are just strings. Not interesting. The second line is
the interesting one. Here's another instance:
John Doe
EXER/TANGO//
Sally Smith
As you can see, the second line starts with either OPER or EXER and terminates
with //. The second line is also optional. That is, the second line is either
OPER, EXER, or neither. That leads one to this description:
choice
OPER (optional)
EXER (optional)
However, DFDL doesn't allow branches of a choice to be optional. So, the
correct description is:
choice
sequence
OPER (optional)
sequence
EXER (optional)
Slick, aye?
But not correct.
Let's think about this. Suppose the input is this:
John Doe
EXER/TANGO//
Sally Smith
While processing the second line, you would think that the DFDL processor would
find that the first branch of the choice (the OPER branch) doesn't match and
therefore the processor would process the line using the second branch. Ha! Not
correct!
The first branch is optional. That is key! Since the second line doesn't start
with OPER, the DFDL processor thinks, "Oh, there must be no occurrences of the
OPER line." So, the processor moves on to the description following the choice.
Do you see it? Do you see the problem? I hope so. This is wicked cool. As I
worked through this example, it forced me to think very, very clearly about the
implication of an optional OPER line. So, what's the solution? Make OPER and
EXER mandatory:
choice
sequence
OPER (mandatory)
sequence
EXER (mandatory)
And, place the choice inside an optional wrapper element:
OPER-EXER-wrapper (optional)
choice
sequence
OPER (mandatory)
sequence
EXER (mandatory)
Now, with this input:
John Doe
EXER/TANGO//
Sally Smith
The processor will try the first branch of the choice, it fails, so it tries
the second branch and succeeds.
With this input:
John Doe
Sally Smith
The processor will try the first branch of the choice, it fails, try the second
branch, it fails, so there is no value for the wrapper element.
This blows my mind. I feel like this example alone boosted my IQ by 10 points.
/Roger