Re: Unparsing a 10 GB JSON infoset

Claude Mamo Fri, 24 May 2019 04:41:17 -0700

Still a great library Mike. I think we can get away with writing in
increments but will definitely keep in mind about contributing to the
Daffodil community.


Thanks,

Claude

On Thu, 23 May 2019 at 13:58, Beckerle, Mike <[email protected]> wrote:

> Claude,
>
> I am afraid you will be disappointed that today daffodil 2.3.0 cannot
> handle data files this large in one shot.
>
> A unlimited size stream of smaller data items can be handled, but not a
> single large file being parsed into a single json root.
>
> Lots of people have this requirement.
>
> We do have ambitions in the roadmap to provide a more incremental parse
> and unparse for large files when the dfdl schema allows it. On the parse
> side this would be more like XML SAX or STAX APIs (you can still create
> json as Infoset, this is just the API style). The unparse side already has
> a streaming API, but the implementation doesnt provide the streaming
> behavior except in very very simple schemas.
>
> If you are interested in becoming a Daffodil developer to implement what
> you need, we are always looking for contributors to dig in and would
> provide lots of initial assistance.
>
> -Mike Beckerle
> Tresys
>
>
> From: Claude Mamo
> Sent: Thursday, May 23, 2:21 AM
> Subject: Unparsing a 10 GB JSON infoset
> To: [email protected]
>
>
> Hello all,
>
> I'm testing Daffodil's capability to handle large files. The parsing is
> done in chunks but the unparsing happens in one go. For the latter, the
> following error happens after about 100 MB is written out to disk:
> "OutOfMemoryError: GC Overhead Limit Exceeded". Should unparsing happen in
> chunks as well or could this be a memory leak? The DFDL schema isn't
> particularly complex from my perspective and the validation is very basic
> (mostly maxOccurs=1 for a few elements).
>
> Claude
>
>
>

Re: Unparsing a 10 GB JSON infoset

Reply via email to