Hello,
I am working on the XProc 3.x specification for two steps, p:dfdl-parse and
p:dfdl-unparse. The specification needs to be DFDL processor neutral as much as
possible. I fully expect implementers of XProc 3.x will use Apache Daffodil
since it is free and open source.
Here are some of the advantages of XProc 3.0 over XProc 1.0
• Greatly simplified syntax, addition of AVTs
• Multiple documents may be output from a port (e.g. from Daffidil using
-stream)
• Supports inputs and outputs other than XML (e.g. JSON, binary)
The structure follows the earlier XProc spec, but with some modifications. The
original XProc 1.0 step looked like this:
<declare-step type="dfdl:parse">
<input port="source" />
<output port="result" />
<option name="schema" required="true" />
<option name="root" /> <!-- (QName) -->
</declare-step>
The 3.x one currently looks like this:
<p:declare-step type="p:dfdl-parse">
<p:input port="schema" content-types="xml"/>
<p:input port="source" primary="true" content-types="any"/>
<p:output port="result" sequence="true" content-types="any"/>
<p:option name="parameters" as="map(xs:QName, item()*)?"/>
<p:option name="fail-on-error" as="xs:boolean" select="true()"/>
<p:option name="stream" as="xs:boolean" select="false()"/>
<p:option name="root" as="xs:QName" />
</p:declare-step>
Some notes:
• The result document is any content-type, users can pick which they
want. DFDL does not specify required serialization outputs, but practically
speaking most XProc users will want an XML infoset.
• I expect parameters will map to Daffodil variables. How this mapping
occurs will be implementation-defined.
• stream is to control the -stream or -nostream parameter, by default
-nostream
o If stream is specified, multiple documents may be represented on the
result port (see sequence=”true”)
• root maps to the parameter of the same name, it must be formatted as
an xs:QName
• fail-on-error indicates whether or not processing should continue if a
recoverable error is encountered. I’m not sure what this would be.
• There is no explicit support for parser files. I assume these are
proprietary representations to Daffodil and cannot interoperate with other DFDL
implementations.
• There is no separate p:parse-file step. XProc 3.0 supports conveying
non-XML data over its ports.
• It is possible, although implementation defined that an XProc 3.0
processer will accept a Daffodil configuration file (i.e. an instance of
dafext.xsd). For example, MorganaXProc currently accepts external configuration
files for Saxon.
• A PSVI should become available post successful parse
A p:dfdl-unparse has not been sketched out but will likely look mostly the same.
On timing, the XProc group wants to get a new version of XProc out relatively
soon, so I will need to put together a formal proposal fairly quickly. Any
feedback is greatly appreciated!
Regards,
John Dziurlaj
-----Original Message-----
From: Steve Lawrence <[email protected]>
Sent: Wednesday, August 14, 2024 7:26 AM
To: [email protected]
Subject: Re: DFDL and XProc
That sounds great! If you need any help creating or reviewing the proposal let
us know. We'd be happy to lend a hand.
On 2024-08-13 12:22 PM, John Dziurlaj wrote:
> I am a heavy user of XProc 3.0. DFDL has a XProc step implementation,
> but it’s for the XProc 1.0 version of Calabash. The XProc people have
> a GitHub <https://github.com/xproc/3.0-steps/issues> repository where
> interested parties can create proposals for implementation (via
> issues). I am happy to create a proposal, likely based off the
> existing Calabash one, but with some modifications that make it more
> idiomatic for XProc 3.0.
>
> Because Apache Daffodil comes with EXI, I may write a EXI parsing step as
> well.
>
> Regards,
>
> John Dziurłaj /d͡ʑurwaj/
>
> Sr. Solutions Architect, The Turnout
>
> e: [email protected] <mailto:[email protected]>
>
> s: +1 (330) 714-8935
> x: @dziurlaj
> work hours: 7am-3pm ET
>
> http://turnout.rocks <http://turnout.rocks/>
>
> @turnoutrocks
>