Here's a slide I use about performance to give people some expectations of what they should be able to achieve if calling Daffodil with a pre-compiled DFDL schema. I did this test using the Daffodil command-line tool.
This is for dense binary data. Mostly binary integers, flags, short strings. For verbose textual data the Daffodil overhead will be higher, but the point here is that if something is taking like 2-seconds extra, that's not per-parse-unparse overhead, that's doing some expensive initialization repeatedly that isn't needed to be done repeatedly. [cid:b0d93fd4-2bdb-46e6-a7e8-2fb47c5bf386] ________________________________ From: Sloane, Brandon <[email protected]> Sent: Friday, August 30, 2019 10:47 AM To: [email protected] <[email protected]> Subject: Re: How to speed up DFDL processing? Serializing data to XML (or JSON) will never give you optimal performance, and Daffodil has not been heavily optimized. Having said that, there are 2 common sources of slowness that can be avoid by users: 1) Schema compilation. There is an ongoing effort to improve in this regard. Users can mitigate this concern by precompiling schemas using `daffodil save-parser` and `daffodil parse -P`. If using daffodil as a library, you can also compile once on initialization then reuse the compile schema throughout the programs lifetime 2) JVM startup time. Not much Daffodil can do about this one. There are a couple of options for user: a) Use the --stream option, which allows a single instance of Daffodil to parse a stream of messages b) Use Daffodil as a library from a long-lived process c) Use a third party tool to speed up JVM startup time ________________________________ From: Costello, Roger L. <[email protected]> Sent: Friday, August 30, 2019 9:55 AM To: [email protected] <[email protected]> Subject: How to speed up DFDL processing? Hello DFDL community, A project that is using DFDL reported this to me: A comment was made about the latency of using Daffodil. One of the developers that is implementing DFDL said that they were seeing a 2 second increase in the latency for the dataflow using Daffodil. How to respond to this? Is an addition of 2 seconds to a dataflow to be expected? How to make things run faster? /Roger
