Re: NiFi Performance Analysis Clarification

Mark Payne Wed, 13 Jun 2018 10:11:43 -0700

Prashanth,

"will it will it spread out the stop-the-world time across the intervals. In 
that case, my average would fall to same figures right?


It's hard to say - you'd have to give it a try and see if it improves. There 
are a lot of different optimizations, both at the JVM
and the Operating System level, that come into play here. It may give much 
better performance. Or perhaps worse performance,
but it's certainly worth trying out.

Thanks
-Mark


On Jun 13, 2018, at 1:04 PM, V, Prashanth (Nokia - IN/Bangalore) 
<prashant...@nokia.com<mailto:prashant...@nokia.com>> wrote:

Mark,
Thanks for the reply. Please find the comments inline.

Thanks & Regards,
Prashanth

From: Mark Payne [mailto:marka...@hotmail.com]
Sent: Wednesday, June 13, 2018 6:07 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi Performance Analysis Clarification

Prashanth,

Whenever the FlowFile Repository performs a Checkpoint, it has to ensure that 
it has flushed all data to disk
before continuing, so it performs an fsync() call so that any data buffered by 
the Operating System is flushed
to disk as well. If you're using the same physical drive / physical partition 
for FlowFile Repository as you are
for content, provenance, logs, etc. then this can be very costly.

It is always a best practice for any production system to try to isolate the 
FlowFile Repository to its own physical
partition, the Content Repository to its own physical partition (or multiple 
partitions) and the Provenance Repository
to its own physical partition (or multiple partitions). Placing the FlowFile 
Repo on its own partition is likely to address
the issue on its own (Update the value of the 
"nifi.flowfile.repository.directory" property in nifi.properties - but be 
warned,
you'll lose any data in your flow if you point to an empty directory so you'll 
need to also move the contents of ./flowfile_repository
to the new directory or stop your source processors and bleed out all the data 
from your flow first).  I tried once by giving flowfile repo in one partition 
and content& provenance in other. But I think still I faced this problem. But 
didn’t remember well.

Additionally, you may see better results by adjusting the value of the 
"nifi.flowfile.repository.checkpoint.interval" property
from "2 mins" to something smaller like "15 secs". Oh thats nice. I will try 
this config. But , just curious, will it spread out the stop-the-world time 
across the intervals. In that case, my average would fall to same figures right?

Thanks
-Mark




On Jun 13, 2018, at 8:10 AM, V, Prashanth (Nokia - IN/Bangalore) 
<prashant...@nokia.com<mailto:prashant...@nokia.com>> wrote:

Hi Mike,

Thanks for the reply. Actually , we did all those optimisations with kafka. I 
am converting to avro, also I configured kafka producer properties accordingly. 
I believe kafka is not a bottleneck.
I am sure because, I can see pretty good throughput with my flow. But average 
throughput is reduced as stop-the-world signal happening for long time. Correct 
me if I am wrong..

Thanks & Regards,
Prashanth

From: Mike Thomsen [mailto:mikerthom...@gmail.com]
Sent: Wednesday, June 13, 2018 4:23 PM
To: V, Prashanth (Nokia - IN/Bangalore) 
<prashant...@nokia.com<mailto:prashant...@nokia.com>>
Cc: users@nifi.apache.org<mailto:users@nifi.apache.org>; 
pierre.villard...@gmail.com<mailto:pierre.villard...@gmail.com>
Subject: Re: NiFi Performance Analysis Clarification

Relevant: http://www.idata.co.il/2016/09/moving-binary-data-with-kafka/

If you're throwing 1MB and bigger files at Kafka, that's probably where your 
slowdown is occurring. Particularly if you're running a single node or just two 
nodes. Kafka was designed to process extremely high volumes of small messages 
(at most 10s of kb, not MB and certainly not GB). What you can try is building 
an Avro schema for your CSV files and using PublishKafkaRecord to break 
everything down into records that are an appropriate fit for Kafka.

On Wed, Jun 13, 2018 at 6:38 AM V, Prashanth (Nokia - IN/Bangalore) 
<prashant...@nokia.com<mailto:prashant...@nokia.com>> wrote:
Please find answers inline

Thanks & Regards,
Prashanth

From: Pierre Villard 
[mailto:pierre.villard...@gmail.com<mailto:pierre.villard...@gmail.com>]
Sent: Wednesday, June 13, 2018 3:56 PM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi Performance Analysis Clarification

Hi,

What's the version of NiFi you're using?  1.6.0
What are the file systems you're using for the repositories? Local rhel file 
system (/home dir)

I think that changing the heap won't make any different in this case. I'd keep 
it to something like 8GB (unless you're doing very specific stuff that are 
memory consuming) and let the remaining to OS and disk caching.
I think NiFi holds the snapshotmap in memory.. since we are dealing with pretty 
huge ingress data [I allocated 32GB out of 42GB to NiFi]. Hence, I increased 
so.  Does this has anything to do with flowfile checkpoint delay?

Pierre

2018-06-13 11:58 GMT+02:00 V, Prashanth (Nokia - IN/Bangalore) 
<prashant...@nokia.com<mailto:prashant...@nokia.com>>:
Hi Mike,

I am retrieving many small csv files each of size 1MB (total folder size around 
~100GB). In update step, I am doing some enrichment on ingress csv. Anyway my 
flow doesn’t do anything with the stop the world time right?

Can you please tell me about flowfile checkpointing related tunings?

Thanks & Regards,
Prashanth

From: Mike Thomsen 
[mailto:mikerthom...@gmail.com<mailto:mikerthom...@gmail.com>]
Sent: Wednesday, June 13, 2018 2:33 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi Performance Analysis Clarification

What are you retrieving (particularly size) and what happens in the "update" 
step?

Thanks,

Mike

On Wed, Jun 13, 2018 at 4:10 AM V, Prashanth (Nokia - IN/Bangalore) 
<prashant...@nokia.com<mailto:prashant...@nokia.com>> wrote:
Hi Team,

I am doing some performance testing in NiFi. WorkFlow is GetSFTP -> update -> 
PutKafka. I want to tune my setup to achieve high throughput without much 
queuing.
But my throughput average drops during flowfile checkpointing duration. I 
believe stop-the-world  is happening during that time.

I can roughly read ~100MB/s from SFTP and send almost same to Kafka. But every 
2 mins, it stops the complete execution. Check below logs

2018-06-13 13:24:21,160 INFO [pool-10-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile 
Repository
2018-06-13 13:24:49,420 INFO [Write-Ahead Local State Provider Maintenance] 
org.wali.MinimalLockingWriteAheadLog 
org.wali.MinimalLockingWriteAheadLog@cf82c58<mailto:org.wali.MinimalLockingWriteAheadLog@cf82c58>
 checkpointed with 23 Records and 0 Swap Files in 39353 milliseconds 
(Stop-the-world time = 3 milliseconds, Clear Edit Logs time = 3 millis), max 
Transaction ID 68
2018-06-13 13:25:00,165 INFO [pool-10-thread-1] 
o.a.n.wali.SequentialAccessWriteAheadLog Checkpointed Write-Ahead Log with 7 
Records and 0 Swap Files in 39002 milliseconds (Stop-the-world time = 28275 
milliseconds), max Transaction ID 316705
2018-06-13 13:25:00,169 INFO [pool-10-thread-1] 
o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile 
Repository with 7 records in 39008 milliseconds

I think all processor goes in idle state for 39 seconds ☹ .. Please guide how 
to tune it..
I changed the heap memory with 32G [I am testing on 12 core, 48G machine]. I 
disabled content-repository archiving. All other properties remains same.

Thanks & Regards,
Prashanth

Re: NiFi Performance Analysis Clarification

Reply via email to