Here's some highish level answers. If you need more details on anything
let us know.
1. Yep, we call this feature "layers". You can create a custom layer
plugin that receives data (as defined by the DFDL schema), your layer
code transforms (e.g. uncompresses) and outputs that data, and then
Daffodil parses the outputted data as defined by the DFDL schema.
Here are implementations of the layers included with Daffodil for gzip,
base64, line folding, and byte swapping:
https://github.com/apache/daffodil/tree/main/daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1
And they are pluggable using Java service loaders, e.g.:
https://github.com/apache/daffodil/blob/main/daffodil-runtime1-layers/src/main/resources/META-INF/services/org.apache.daffodil.runtime1.layers.LayerCompiler
So you can create the layer outside of Daffodil, create a jar with the
right services file, put it on the classpath and Daffodil will be able
to find and use it.
And here is the design proposal of the feature with more details and
links to related design pages:
https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Dynamically+loading+Layer+Transformations
2. I don't think we have any documentation, but we have a number of
examples how to define custom charsets. For example, here's a fairly
small IBM037 charset that we include in Daffodil which is just a lookup
table:
https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/scala/org/apache/daffodil/io/processors/charset/IBM037.scala
You essentially just need to implement BitsCharsetDefinition which
returns a "BitsCharset" that can creae a BitsCharsetEncoder/Decoder.
Depending on the complexity of your charset, you maybe be able to use
existing base classes (e.g. BitsCharseJava) that do a lot of the heavy
lifting.
Note that these are also loaded using Java service loaders, e.g.:
https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition
3. Not at the moment. If you wanted only a subset of fields, you would
need to post process the fields and extract what parts you need
yourself. Languages like XSLT/XQuery could probably do this without too
much effort.
Another alternative would be to create a custom InfosetOutputter that
would ignore infoset events that you don't care about and keep those you
do. You could use your own logic for how you determine which fields are
important, or you could also use dfdlx:runtimeProperties to annotate the
schema and have your custom InfosetOutputter use those. Here's the
design information on runtime properties:
https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties
Here's a small example of a custom InfosetOutputter we use for testing,
which just captures all events and stores them in a list. You could
imagine doing some sort of filtering and only capture the fields you
want and ouputting to a custom data structure instead of XML, for example.
https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/TestInfosetOutputter.java
4. I haven't personally done a lot of DFDL schema generation, though I
know other Daffodil devs have, they may be able to chime in on helpful
tips. But I don't think it's anything unique really. I think mostly what
they do is get a machine readable specification of the data format, load
that into some model and then iterate over the model and output strings
to file. We're very familiar with Scala so we tend to write DFDL schema
generators in that, which is also nice since it has language support for
XML. So XML templates are sort of built into the language. But any
template language would probably work fine.
- Steve
On 2023-03-13 06:36 AM, Roded Bahat wrote:
Hi all,
I'm looking into integrating Apache Daffodil into our product and have
several questions for which I could not find answers in the
documentation or issues.
1. Is it currently possible to extend Daffodil with custom types? For
example, could I create a custom field type for a field compressed with
a custom compression and have Daffodil call my own code for further
parsing of the original field value?
2. The DFDL spec states that additional implementation-defined encoding
names can be defined. How would a custom encoding be defined in the DFDL
specification?
3. Is it currently possible to parse a input stream but output only a
set of field from the specification? For example, could an XPath be
specified to determine which nodes in the specification Daffodil will
output?
4. Is there a recommended way of dynamically creating a DFDL
specification XSD? or should I just use general tooling?
Any pointers and help would be much appreciated.
Thanks!
Roded