I decided to write this up for the user list for posterity
Often in DFDL schema we have error cases where data is unhandled by the schema,
but when the length of the data (e.g., a message perhaps) can still be
determined.
In that case, where you can still figure out the length, it is common for users
to want to tolerate erroneous data by capturing the unrecognized data, rather
than failing to parse it.
The technique to achieve this is to find the top-level primary choice of the
schema. This choice usually selects the recognized message types by way of
dfdl:choiceDispatchKey, selecting alternatives or "branches" which contain
dfdl:choiceBranchKey.
The way you add a "default branch" that is selected if none of the others are,
is to nest this primary choice inside another choice. This encapsulating choice
has two branches. One branch is the primary choice as exists. The second branch
is the "unrecognized" case, and constructs an element to capture that data.
So putting that all together:
<xs:choice>
<xs:choice dfdl:choiceDispatchKey="{ .... }">
... branches for recognized messages....
</xs:choice>
<xs:element name="unrecognized" type="xs:hexBinary"
dfdl:lengthKind="explicit"
dfdl:length="{ ....determine the length ... }"/>
</xs:choice>
I've been advocating that people use a DFDL variable to control whether their
schema causes an error on unrecognized data, or captures it in the style shown
above. That way one schema can be used both ways.
To implement the variable for control of error vs. capture unrecognized
messages, you would just add an assert to the element above, which fails if the
variable is set to cause errors, and passes if it is set to cause capture as
elements. Here's the whole thing:
<xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl">
<dfdl:defineVariable name="captureUnrecognizedMessages" type="xs:boolean"
defaultValue="true" external="true"/>
</xs:appinfo></xs:annotation>
<xs:choice>
<xs:choice dfdl:choiceDispatchKey="{ .... }">
... branches for recognized messages....
</xs:choice>
<xs:sequence> <!-- handle unrecognized message -->
<xs:sequence>
<!--
discriminator true so we lock in that this branch *is* going to
be selected.
Subsequently, if the assert below fails, that specific diagnostic
message will be issued,
this choice will not backtrack and issue some non-descript
"all choices failed" message.
-->
<xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl">
<dfdl:discriminator>{ fn:true() }</dfdl:discriminator>
</xs:appinfo></xs:annotation>
</xs:sequence>
<!--
Element to capture unrecognized data. Captures, or assert fails
with
diagnostic message.
-->
<xs:element name="unrecognized" type="xs:hexBinary"
dfdl:lengthKind="explicit"
dfdl:length="{ ....determine the length ... }">
<xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl">
<!--
This assert passes if we're capturing unrecognized messages
fails, and issues diagnostic message otherwise.
Note that the message can be an expression which would
include
identifying message ID. You just have to be certain that
message expression
will always succeed to evaluate.
-->
<dfdl:assert message="unrecognized message type">{
$tns:captureUnrecognizedMessages
}</dfdl:assert>
</xs:appinfo></xs:annotation>
</xs:element>
</xs:sequence>
</xs:choice>
Keep in mind if you define a DFDL schema to "recognize" unhandled messages and
parse them into hexBinary elements of undifferentiated bytes, then as far as
the DFDL schema is concerned that data is valid. So you need some separate
capability in the system to flag when these unhandled elements are being
created so they don't walk through your entire system as if they are valid.