Still a great library Mike. I think we can get away with writing in increments but will definitely keep in mind about contributing to the Daffodil community.
Thanks, Claude On Thu, 23 May 2019 at 13:58, Beckerle, Mike <[email protected]> wrote: > Claude, > > I am afraid you will be disappointed that today daffodil 2.3.0 cannot > handle data files this large in one shot. > > A unlimited size stream of smaller data items can be handled, but not a > single large file being parsed into a single json root. > > Lots of people have this requirement. > > We do have ambitions in the roadmap to provide a more incremental parse > and unparse for large files when the dfdl schema allows it. On the parse > side this would be more like XML SAX or STAX APIs (you can still create > json as Infoset, this is just the API style). The unparse side already has > a streaming API, but the implementation doesnt provide the streaming > behavior except in very very simple schemas. > > If you are interested in becoming a Daffodil developer to implement what > you need, we are always looking for contributors to dig in and would > provide lots of initial assistance. > > -Mike Beckerle > Tresys > > > From: Claude Mamo > Sent: Thursday, May 23, 2:21 AM > Subject: Unparsing a 10 GB JSON infoset > To: [email protected] > > > Hello all, > > I'm testing Daffodil's capability to handle large files. The parsing is > done in chunks but the unparsing happens in one go. For the latter, the > following error happens after about 100 MB is written out to disk: > "OutOfMemoryError: GC Overhead Limit Exceeded". Should unparsing happen in > chunks as well or could this be a memory leak? The DFDL schema isn't > particularly complex from my perspective and the validation is very basic > (mostly maxOccurs=1 for a few elements). > > Claude > > >
