RE: NiFi and Mongo

Kelsey RIDER Thu, 21 Jun 2018 05:00:38 -0700

OK, thanks for the heads-up!

If I could make another suggestion: could the JSONPathReader be made a little 
more dynamic? Currently you have to specify every single field…


In my case (although I doubt I’m alone), I have several different collections 
with different schemas. My options are either to have one JSONPathReader with 
dozens of attributes, or else one Reader per collection type (but then I’d have 
to somehow dynamically choose which reader to use). It would be easier if there 
were a way to have a single expression (wildcards? Regex?) that could pick up 
several properties at once.


From: Mike Thomsen <[email protected]>
Sent: jeudi 21 juin 2018 13:06
To: [email protected]
Subject: Re: NiFi and Mongo

Your general assessment about what you'd need is correct. It's a fairly easy 
component to build, and I'll throw up a Jira ticket for it. Would definitely be 
doable for NiFi 1.8.

Expect the Mongo stuff to go through some real clean up like this in 1.8. One 
of the other big changes is I will be moving the processors to using a 
controller service as an optional configuration for the Mongo client with the 
plan that by probably 1.9 all of the Mongo processors will drop their own 
client configurations and use the same pool (currently every processor instance 
maintains its own).

On Thu, Jun 21, 2018 at 3:13 AM Kelsey RIDER 
<[email protected]<mailto:[email protected]>> wrote:
Hello,

I’ve been experimenting with NiFi and MongoDB. I have a test collection with 1 
million documents in it. Each document has the same flat JSON structure with 11 
fields.
My NiFi flow exposes a webservice, which allows the user to fetch all the data 
in CSV format.

However, 1M documents brings NiFi to its knees. Even after increasing the JVM’s 
Xms and Xmx to 2G, I still get an OutOfMemoryError:

2018-06-20 11:27:43,428 WARN [Timer-Driven Process Thread-7] 
o.a.n.controller.tasks.ConnectableTask Admng.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
        at java.lang.StringBuilder.append(StringBuilder.java:136)
        at 
org.apache.nifi.processors.mongodb.GetMongo.buildBatch(GetMongo.java:222)
        at 
org.apache.nifi.processors.mongodb.GetMongo.onTrigger(GetMongo.java:341)
        at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
        at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1147)
        at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:175)
        at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenScheduling
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThr
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPool
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

I dug into the code, and discovered that the GetMongo processor takes all the 
Documents returned from Mongo, converts them to Strings, and concatenates them 
in a StringBuilder.

My question is thus: is there a better way that I should be doing this?
The only idea I’ve had is to use a smaller batch size, but that would mean that 
I’d just need a later processor to concatenate the batches in order to get one 
big CSV.
Is there some sort of “GetMongoRecord” processor that reads each mongo Document 
as a record, in the way ExecuteSQL does? (I’ve done the same test with an SQL 
database, and it handles 1M records just fine.)

Thanks for your help,

Kelsey
Suite à l’évolution des dispositifs de réglementation du travail, si vous 
recevez ce mail avant 7h00, en soirée, durant le week-end ou vos congés merci, 
sauf cas d’urgence exceptionnelle, de ne pas le traiter ni d’y répondre 
immédiatement.

RE: NiFi and Mongo

Reply via email to