Thanks Eric. So then this in the error message - java.lang.OutOfMemoryError - isn't really to be taken at face value. FlattenJson tried to index an array that exceeded the maximum value of an integer, and it choked.
An 8 GB file really isn't that large. I'm hoping someone has encountered this before and will weigh in with a reply. On Fri, Jun 14, 2024 at 2:08 PM Eric Secules <esecu...@gmail.com> wrote: > Hi James, > > I don't have a solution for you off the top of my head. But I can tell you > the failure is because you've got an array longer than the maximum value of > an Int. So, memory is not the limiting factor. > > -Eric > > On Fri, Jun 14, 2024, 10:59 AM James McMahon <jsmcmah...@gmail.com> wrote: > >> I have a json file, incoming.json. It is 9 GB in size. >> >> I want to flatten the json so that I can tabulate the number of times >> each key appears. Am using a FlattenJson 2.0.0-M2 processor, with >> this configuration: >> >> Separator . >> Flatten Mode normal >> Ignore Reserved Characters false >> Return Type flatten >> Character Set UTF-8 >> Pretty Print JSON true >> >> This processor has worked so far on json files as large as 2 GB. But this >> 9 GB one is causing this issue: >> >> FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing halted: >> yielding [1 sec]: java.lang.OutOfMemoryError: Required array length >> 2147483639 + 9 is too large >> >> >> htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi heap >> shows it has 88GB of that dedicated for its use. >> >> >> How can I handle large json files in this processor? It would seem that >> breaking the file up is not an option because it will violate the integrity >> of the json structure most likely. >> >> >> What options do I have? >> >>