OK, thanks for the heads-up! If I could make another suggestion: could the JSONPathReader be made a little more dynamic? Currently you have to specify every single field…
In my case (although I doubt I’m alone), I have several different collections with different schemas. My options are either to have one JSONPathReader with dozens of attributes, or else one Reader per collection type (but then I’d have to somehow dynamically choose which reader to use). It would be easier if there were a way to have a single expression (wildcards? Regex?) that could pick up several properties at once. From: Mike Thomsen <[email protected]> Sent: jeudi 21 juin 2018 13:06 To: [email protected] Subject: Re: NiFi and Mongo Your general assessment about what you'd need is correct. It's a fairly easy component to build, and I'll throw up a Jira ticket for it. Would definitely be doable for NiFi 1.8. Expect the Mongo stuff to go through some real clean up like this in 1.8. One of the other big changes is I will be moving the processors to using a controller service as an optional configuration for the Mongo client with the plan that by probably 1.9 all of the Mongo processors will drop their own client configurations and use the same pool (currently every processor instance maintains its own). On Thu, Jun 21, 2018 at 3:13 AM Kelsey RIDER <[email protected]<mailto:[email protected]>> wrote: Hello, I’ve been experimenting with NiFi and MongoDB. I have a test collection with 1 million documents in it. Each document has the same flat JSON structure with 11 fields. My NiFi flow exposes a webservice, which allows the user to fetch all the data in CSV format. However, 1M documents brings NiFi to its knees. Even after increasing the JVM’s Xms and Xmx to 2G, I still get an OutOfMemoryError: 2018-06-20 11:27:43,428 WARN [Timer-Driven Process Thread-7] o.a.n.controller.tasks.ConnectableTask Admng.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) at java.lang.StringBuilder.append(StringBuilder.java:136) at org.apache.nifi.processors.mongodb.GetMongo.buildBatch(GetMongo.java:222) at org.apache.nifi.processors.mongodb.GetMongo.onTrigger(GetMongo.java:341) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1147) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:175) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenScheduling at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThr at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPool at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) I dug into the code, and discovered that the GetMongo processor takes all the Documents returned from Mongo, converts them to Strings, and concatenates them in a StringBuilder. My question is thus: is there a better way that I should be doing this? The only idea I’ve had is to use a smaller batch size, but that would mean that I’d just need a later processor to concatenate the batches in order to get one big CSV. Is there some sort of “GetMongoRecord” processor that reads each mongo Document as a record, in the way ExecuteSQL does? (I’ve done the same test with an SQL database, and it handles 1M records just fine.) Thanks for your help, Kelsey Suite à l’évolution des dispositifs de réglementation du travail, si vous recevez ce mail avant 7h00, en soirée, durant le week-end ou vos congés merci, sauf cas d’urgence exceptionnelle, de ne pas le traiter ni d’y répondre immédiatement.
