Re: FlattenJSON fails on large json file

James McMahon Fri, 14 Jun 2024 11:13:54 -0700

Thanks Eric. So then this in the error message - java.lang.OutOfMemoryError
- isn't really to be taken at face value. FlattenJson tried to index an
array that exceeded the maximum value of an integer, and it choked.


An 8 GB file really isn't that large. I'm hoping someone has encountered
this before and will weigh in with a reply.

On Fri, Jun 14, 2024 at 2:08 PM Eric Secules <esecu...@gmail.com> wrote:

> Hi James,
>
> I don't have a solution for you off the top of my head. But I can tell you
> the failure is because you've got an array longer than the maximum value of
> an Int. So, memory is not the limiting factor.
>
> -Eric
>
> On Fri, Jun 14, 2024, 10:59 AM James McMahon <jsmcmah...@gmail.com> wrote:
>
>> I have a json file, incoming.json. It is 9 GB in size.
>>
>> I want to flatten the json so that I can tabulate the number of times
>> each key appears. Am using a FlattenJson 2.0.0-M2 processor, with
>> this configuration:
>>
>> Separator                                   .
>> Flatten Mode                              normal
>> Ignore Reserved Characters      false
>> Return Type                                flatten
>> Character Set                              UTF-8
>> Pretty Print JSON                       true
>>
>> This processor has worked so far on json files as large as 2 GB. But this
>> 9 GB one is causing this issue:
>>
>> FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing halted: 
>> yielding [1 sec]: java.lang.OutOfMemoryError: Required array length 
>> 2147483639 + 9 is too large
>>
>>
>> htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi heap 
>> shows it has 88GB of that dedicated for its use.
>>
>>
>> How can I handle large json files in this processor? It would seem that 
>> breaking the file up is not an option because it will violate the integrity 
>> of the json structure most likely.
>>
>>
>> What options do I have?
>>
>>

Re: FlattenJSON fails on large json file

Reply via email to