James

You may be able to use alternative JSON components such as those with
record readers/writes.

You could certainly write a nifi processor in either Java or Python that
would do this and be super efficient.

The processor you've chosen just isn't very flexible in regards to larger
objects and how it uses memory.

Thanks

On Fri, Jun 14, 2024 at 11:13 AM James McMahon <jsmcmah...@gmail.com> wrote:

> Thanks Eric. So then this in the error message - java.lang.OutOfMemoryError
> - isn't really to be taken at face value. FlattenJson tried to index an
> array that exceeded the maximum value of an integer, and it choked.
>
> An 8 GB file really isn't that large. I'm hoping someone has encountered
> this before and will weigh in with a reply.
>
> On Fri, Jun 14, 2024 at 2:08 PM Eric Secules <esecu...@gmail.com> wrote:
>
>> Hi James,
>>
>> I don't have a solution for you off the top of my head. But I can tell
>> you the failure is because you've got an array longer than the maximum
>> value of an Int. So, memory is not the limiting factor.
>>
>> -Eric
>>
>> On Fri, Jun 14, 2024, 10:59 AM James McMahon <jsmcmah...@gmail.com>
>> wrote:
>>
>>> I have a json file, incoming.json. It is 9 GB in size.
>>>
>>> I want to flatten the json so that I can tabulate the number of times
>>> each key appears. Am using a FlattenJson 2.0.0-M2 processor, with
>>> this configuration:
>>>
>>> Separator                                   .
>>> Flatten Mode                              normal
>>> Ignore Reserved Characters      false
>>> Return Type                                flatten
>>> Character Set                              UTF-8
>>> Pretty Print JSON                       true
>>>
>>> This processor has worked so far on json files as large as 2 GB. But
>>> this 9 GB one is causing this issue:
>>>
>>> FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing halted: 
>>> yielding [1 sec]: java.lang.OutOfMemoryError: Required array length 
>>> 2147483639 + 9 is too large
>>>
>>>
>>> htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi heap 
>>> shows it has 88GB of that dedicated for its use.
>>>
>>>
>>> How can I handle large json files in this processor? It would seem that 
>>> breaking the file up is not an option because it will violate the integrity 
>>> of the json structure most likely.
>>>
>>>
>>> What options do I have?
>>>
>>>

Reply via email to