We actually recently added a feature that was intended to solve just
this problem of including XML payloads in the resulting infoset as XML
rather than a string. Though it requires a custom InfosetInputter and
InfosetOutputter that have not been written yet.
The proposal is here:
https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties
The idea is that your payload element is just a normal xs:string, and
you annotate it with a custom runtime property like
treatStringAsXML=true. Then you can write a custom InfosetOuputter that
uses his annotation and outputs the string as XML during parse, and a
custom InfosetInputter that converts that XML back to a string during
unparse.
The Example Implementation discusses this exact use case and gives an
idea of how one might implement the custom InfosetInputter/Outputter.
This example uses Scala XML Nodes for simplicity, but could be done with
the standard text inputter/outputters as well.
One thing to point out though is that to Daffodil and its internals,
this payload element is still a string. Daffodil has no knowledge about
what the InfosetInputter/Outputters are doing, so Daffodil cannot
reference the XML payload in DFDL expressions, or validate the XML
against a schema. For validation, you would need to pipe the resulting
infoset to some other tool with a modified schema that does not treat
this payload as a string.
Since this is the second time I've come across this requirement, it
might be worth considering if this will be a more common technique, and
if maybe we should add some built-in mechanism to DFDL, one that would
work with both DFDL expressions and validation...
- Steve
On 9/22/21 11:58 AM, Ballard, Tom - US wrote:
All,
I have a complex data format I am trying to implement a DFDL schema for, but
don’t believe it’s possible without support for either recursion decomposition
and/or layering. The format in question has a subset of messages which consist
of a binary “header” followed by an XML payload. The messages begin with a
handful of binary metadata fields, followed by a binary length field, and then
an XML payload (which is the length indicated in length field). In some cases
there may be binary data subsequent to the XML payload as well. I assume I can
pull the XML payload in as an opaque string blob, but the problem is I also need
to validate that XML against a schema.
I know recursion and layering are on the project wish list, but is there a way
to accomplish full parsing and validation of “hybrid” messages like I described
possible without them?
V/R,
Tom Ballard
--------------------------------------------------------------------------------
This electronic message contains information from CACI International Inc or
subsidiary companies, which may be company sensitive, proprietary, privileged or
otherwise protected from disclosure. The information is intended to be used
solely by the recipient(s) named above. If you are not an intended recipient, be
aware that any review, disclosure, copying, distribution or use of this
transmission or its contents is prohibited. If you have received this
transmission in error, please notify the sender immediately.