Hi Folks,
I think DFDL is awesome. Think about it: DFDL is a standard language for
describing (describe, not parse) just about any data format. Again, I emphasize
that it's not about how to parse the data format, it's about describing the
data format. Given a description a DFDL processor can figure out how to parse
instances of the data format. Wow!
But there's another reason that DFDL is awesome: it forces you to be very
precise in your description. It forces you to think very logically. It forces
you to understand the implications of your description decisions. Let me give
you an example of the later.
I am dealing with a data format that consists of a sequence of lines. Here's a
sample instance:
John Doe
OPER/XRAY//
Sally Smith
The first and last lines are just strings. Not interesting. The second line is
the interesting one. Here's another instance:
John Doe
EXER/TANGO//
Sally Smith
As you can see, the second line starts with either OPER or EXER and terminates
with //. The second line is also optional. That is, the second line is either
OPER, EXER, or neither. That leads one to this description:
choice
OPER (optional)
EXER (optional)
However, DFDL doesn't allow branches of a choice to be optional. So, the
correct description is:
choice
sequence
OPER (optional)
sequence
EXER (optional)
Slick, aye?
But not correct.
Let's think about this. Suppose the input is this:
John Doe
EXER/TANGO//
Sally Smith
While processing the second line, you would think that the DFDL processor would
find that the first branch of the choice (the OPER branch) doesn't match and
therefore the processor would process the line using the second branch. Ha! Not
correct!
The first branch is optional. That is key! Since the second line doesn't start
with OPER, the DFDL processor thinks, "Oh, there must be no occurrences of the
OPER line." So, the processor moves on to the description following the choice.
Do you see it? Do you see the problem? I hope so. This is wicked cool. As I
worked through this example, it forced me to think very, very clearly about the
implication of an optional OPER line. So, what's the solution? Make OPER and
EXER mandatory:
choice
sequence
OPER (mandatory)
sequence
EXER (mandatory)
And, place the choice inside an optional wrapper element:
OPER-EXER-wrapper (optional)
choice
sequence
OPER (mandatory)
sequence
EXER (mandatory)
Now, with this input:
John Doe
EXER/TANGO//
Sally Smith
The processor will try the first branch of the choice, it fails, so it tries
the second branch and succeeds.
With this input:
John Doe
Sally Smith
The processor will try the first branch of the choice, it fails, try the second
branch, it fails, so there is no value for the wrapper element.
This blows my mind. I feel like this example alone boosted my IQ by 10 points.
/Roger