Hello,

I am working on the XProc 3.x specification for two steps, p:dfdl-parse and 
p:dfdl-unparse. The specification needs to be DFDL processor neutral as much as 
possible. I fully expect implementers of XProc 3.x will use Apache Daffodil 
since it is free and open source.

Here are some of the advantages of XProc 3.0 over XProc 1.0
•       Greatly simplified syntax, addition of AVTs
•       Multiple documents may be output from a port (e.g. from Daffidil using 
-stream)
•       Supports inputs and outputs other than XML (e.g. JSON, binary)

The structure follows the earlier XProc spec, but with some modifications. The 
original XProc 1.0 step looked like this:

  <declare-step type="dfdl:parse">
      <input port="source" />   
      <output port="result" />
      <option name="schema" required="true" />
      <option name="root" />          <!-- (QName) -->
   </declare-step>

The 3.x one currently looks like this:

<p:declare-step type="p:dfdl-parse">
  <p:input port="schema" content-types="xml"/>
  <p:input port="source" primary="true" content-types="any"/>  
  <p:output port="result" sequence="true" content-types="any"/>  
  <p:option name="parameters" as="map(xs:QName, item()*)?"/>    
  <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
  <p:option name="stream" as="xs:boolean" select="false()"/>
  <p:option name="root" as="xs:QName" />
</p:declare-step>

Some notes:

•       The result document is any content-type, users can pick which they 
want. DFDL does not specify required serialization outputs, but practically 
speaking most XProc users will want an XML infoset.
•       I expect parameters will map to Daffodil variables. How this mapping 
occurs will be implementation-defined.
•       stream is to control the -stream or -nostream parameter, by default 
-nostream
o       If stream is specified, multiple documents may be represented on the 
result port (see sequence=”true”)
•       root maps to the parameter of the same name,  it must be formatted as 
an xs:QName
•       fail-on-error indicates whether or not processing should continue if a 
recoverable error is encountered. I’m not sure what this would be.
•       There is no explicit support for parser files. I assume these are 
proprietary representations to Daffodil and cannot interoperate with other DFDL 
implementations.
•       There is no separate p:parse-file step. XProc 3.0 supports conveying 
non-XML data over its ports.
•       It is possible, although implementation defined that an XProc 3.0 
processer will accept a Daffodil configuration file (i.e. an instance of 
dafext.xsd). For example, MorganaXProc currently accepts external configuration 
files for Saxon. 
•       A PSVI should become available post successful parse 

A p:dfdl-unparse has not been sketched out but will likely look mostly the same.

On timing, the XProc group wants to get a new version of XProc out relatively 
soon, so I will need to put together a formal proposal fairly quickly. Any 
feedback is greatly appreciated!

Regards,

John Dziurlaj

-----Original Message-----
From: Steve Lawrence <[email protected]> 
Sent: Wednesday, August 14, 2024 7:26 AM
To: [email protected]
Subject: Re: DFDL and XProc

That sounds great! If you need any help creating or reviewing the proposal let 
us know. We'd be happy to lend a hand.

On 2024-08-13 12:22 PM, John Dziurlaj wrote:
> I am a heavy user of XProc 3.0. DFDL has a XProc step implementation, 
> but it’s for the XProc 1.0 version of Calabash. The XProc people have 
> a GitHub <https://github.com/xproc/3.0-steps/issues> repository where 
> interested parties can create proposals for implementation (via 
> issues). I am happy to create a proposal, likely based off the 
> existing Calabash one, but with some modifications that make it more 
> idiomatic for XProc 3.0.
> 
> Because Apache Daffodil comes with EXI, I may write a EXI parsing step as 
> well.
> 
> Regards,
> 
> John Dziurłaj /d͡ʑurwaj/
> 
> Sr. Solutions Architect, The Turnout
> 
> e: [email protected] <mailto:[email protected]>
> 
> s: +1 (330) 714-8935
> x: @dziurlaj
> work hours: 7am-3pm ET
> 
> http://turnout.rocks <http://turnout.rocks/>
> 
> @turnoutrocks
> 

Reply via email to