The backing library of the Json processors does indeed require loading the
entire doc into memory. We should make sure this consideration is
documented if not already.

Could be an interesting idea to not tie SplitJson to this library given
that it might not need all the functionalities of JsonPath and would likely
be a good candidate for streaming.
On Thu, Nov 17, 2016 at 11:23 Mark Payne <marka...@hotmail.com> wrote:

> Hi Mike,
>
> Certainly, I would recommend trying to change the max heap to say 2 GB and
> see if that gives you what you need.
> Looking at the code, it does look like this Processor may not be the most
> efficient in how it is parsing the JSON.
> There are libraries, for example, that provide a "Streaming JSON"
> interface, but this Processor loads the entire JSON
> into heap and then creates an Object Model from it.
>
> Also, what do you have set for the Max Concurrent Tasks? If you have
> multiple threads simultaneously running, you could
> have each one using up quite a lot of heap.
>
> Thanks
> -Mark
>
>
> On Nov 17, 2016, at 10:54 AM, Mike Harding <mikeyhard...@gmail.com> wrote:
>
> ..just for info in bootstrap.conf my heap size is as follows:
>
> java.arg.2=-Xms512m
>
> java.arg.3=-Xmx512m
>
> Would it be a simple case of increasing this? The size of the flowfile
> json array is 35MB.
>
> Mike
>
>
>
> On 17 November 2016 at 15:47, Mike Harding <mikeyhard...@gmail.com> wrote:
>
> Hi All,
>
> I have a flowfile containing a JSON array with 30k objects that I am
> trying to split into separate flowfiles for down stream processing.
>
> The problem is the processor reports a GC Overhead Limit Exceeded warning
> and administratively yields.
>
> Is there anyway of setting up a back pressure option or some changes to
> the nifi config to best address this.
>
> Thanks,
> Mike
>
>
>
>

Reply via email to