Re: Questions from a Daffodil newbie

Steve Lawrence Mon, 13 Mar 2023 05:28:08 -0700

Here's some highish level answers. If you need more details on anythinglet us know.

1. Yep, we call this feature "layers". You can create a custom layerplugin that receives data (as defined by the DFDL schema), your layercode transforms (e.g. uncompresses) and outputs that data, and thenDaffodil parses the outputted data as defined by the DFDL schema.

Here are implementations of the layers included with Daffodil for gzip,base64, line folding, and byte swapping:


https://github.com/apache/daffodil/tree/main/daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1

And they are pluggable using Java service loaders, e.g.:

https://github.com/apache/daffodil/blob/main/daffodil-runtime1-layers/src/main/resources/META-INF/services/org.apache.daffodil.runtime1.layers.LayerCompiler

So you can create the layer outside of Daffodil, create a jar with theright services file, put it on the classpath and Daffodil will be ableto find and use it.

And here is the design proposal of the feature with more details andlinks to related design pages:


https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Dynamically+loading+Layer+Transformations

2. I don't think we have any documentation, but we have a number ofexamples how to define custom charsets. For example, here's a fairlysmall IBM037 charset that we include in Daffodil which is just a lookuptable:


https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/scala/org/apache/daffodil/io/processors/charset/IBM037.scala

You essentially just need to implement BitsCharsetDefinition whichreturns a "BitsCharset" that can creae a BitsCharsetEncoder/Decoder.Depending on the complexity of your charset, you maybe be able to useexisting base classes (e.g. BitsCharseJava) that do a lot of the heavylifting.


Note that these are also loaded using Java service loaders, e.g.:

https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition

3. Not at the moment. If you wanted only a subset of fields, you wouldneed to post process the fields and extract what parts you needyourself. Languages like XSLT/XQuery could probably do this without toomuch effort.

Another alternative would be to create a custom InfosetOutputter thatwould ignore infoset events that you don't care about and keep those youdo. You could use your own logic for how you determine which fields areimportant, or you could also use dfdlx:runtimeProperties to annotate theschema and have your custom InfosetOutputter use those. Here's thedesign information on runtime properties:


https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties

Here's a small example of a custom InfosetOutputter we use for testing,which just captures all events and stores them in a list. You couldimagine doing some sort of filtering and only capture the fields youwant and ouputting to a custom data structure instead of XML, for example.


https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/TestInfosetOutputter.java

4. I haven't personally done a lot of DFDL schema generation, though Iknow other Daffodil devs have, they may be able to chime in on helpfultips. But I don't think it's anything unique really. I think mostly whatthey do is get a machine readable specification of the data format, loadthat into some model and then iterate over the model and output stringsto file. We're very familiar with Scala so we tend to write DFDL schemagenerators in that, which is also nice since it has language support forXML. So XML templates are sort of built into the language. But anytemplate language would probably work fine.


- Steve



On 2023-03-13 06:36 AM, Roded Bahat wrote:

Hi all,
I'm looking into integrating Apache Daffodil into our product and haveseveral questions for which I could not find answers in thedocumentation or issues.
1. Is it currently possible to extend Daffodil with custom types? Forexample, could I create a custom field type for a field compressed witha custom compression and have Daffodil call my own code for furtherparsing of the original field value?2. The DFDL spec states that additional implementation-defined encodingnames can be defined. How would a custom encoding be defined in the DFDLspecification?3. Is it currently possible to parse a input stream but output only aset of field from the specification? For example, could an XPath bespecified to determine which nodes in the specification Daffodil willoutput?4. Is there a recommended way of dynamically creating a DFDLspecification XSD? or should I just use general tooling?
Any pointers and help would be much appreciated.
Thanks!

Roded

Re: Questions from a Daffodil newbie

Reply via email to