Eric,

How much memory is available on the VM/system? Curious if maybe oom_killer or 
some process like that is killing nifi. Might want to check system logs like 
dmesg to see if there’s anything there.

Thanks
-Mark

On Apr 19, 2022, at 4:28 PM, Eric Secules 
<esecu...@gmail.com<mailto:esecu...@gmail.com>> wrote:

Hi Joe & Mark,

I'll work on getting those JVM stats. But the instance is locked down pretty 
tight so getting GC, and Thread dump will be difficult.

We renamed the flow.xml.gz file and restarted with the hope of seeing some UI. 
The overall CPU usage is lower, but that is not helping the UI.

Bootstrap log says:
/opt/nifi/nifi-current/logs$ tail nifi-bootstrap.log
2022-04-19 19:43:03,104 INFO [main] o.a.n.b.NotificationServiceManager 
Successfully loaded the following 0 services: []
2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi 
Registered no Notification Services for Notification Type NIFI_STARTED
2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi 
Registered no Notification Services for Notification Type NIFI_STOPPED
2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi 
Registered no Notification Services for Notification Type NIFI_DIED
2022-04-19 19:43:03,143 INFO [main] org.apache.nifi.bootstrap.RunNiFi Runtime 
Java version: 1.8.0_265
2022-04-19 19:43:03,154 INFO [main] org.apache.nifi.bootstrap.Command Starting 
Apache NiFi...
2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command Working 
Directory: /opt/nifi/nifi-current
2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command Command: 
/usr/local/openjdk-8/bin/java -classpath 
/opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/nifi-nar-utils-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-runtime-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-framework-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/logback-classic-1.2.3.jar:/opt/nifi/nifi-current/./lib/javax.servlet-api-3.1.0.jar:/opt/nifi/nifi-current/./lib/slf4j-api-1.7.30.jar:/opt/nifi/nifi-current/./lib/log4j-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/jul-to-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-properties-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/jcl-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/logback-core-1.2.3.jar:/opt/nifi/nifi-current/./lib/jetty-schemas-3.1.jar
 -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx14g -Xms12g 
-Dcurator-log-only-first-connection-issue-as-error-level=true 
-Djavax.security.auth.useSubjectCredsOnly=true 
-Djava.security.egd=file:/dev/urandom -Dzookeeper.admin.enableServer=false 
-Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true 
-Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol 
-Dnifi.properties.file.path=/opt/nifi/nifi-current/./conf/nifi.properties 
-Dnifi.bootstrap.listen.port=34211 -Dapp=NiFi 
-Dorg.apache.nifi.bootstrap.config.log.dir=/opt/nifi/nifi-current/logs 
org.apache.nifi.NiFi
2022-04-19 19:43:08,517 INFO [main] org.apache.nifi.bootstrap.Command Launched 
Apache NiFi with Process ID 123
2022-04-19 19:43:13,834 INFO [NiFi Bootstrap Command Listener] 
org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for 
Bootstrap requests on port 44417

Mark, our nifi.properties has no value set for that property, so we are using 
whatever the default is i guess. Since it's 1.14.0 we will take your suggestion 
and set it to a long time.

CPU taken by nifi is 37% (3.5 CPU cores) and 22.2% memory the jvm was started 
with -Xmx14g -Xms12g
Disk IO is happening, but not at a high rate.

App log contains these messages before the Jetty shutdown message

{"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
 Loaded 113 properties from system properties and environment
variables"}

{"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
 Loaded a total of 113 properties.  Including precedence overrides
effective accessible registry key size is 113"}

{"level":"INFO","thread":"main","message":"o.a.n.p.store.WriteAheadStorePartition
 After recovering ./provenance_repository, next Event ID to be generated
 will be 13089684"}

{"level":"INFO","thread":"main","message":"o.a.n.p.index.lucene.LuceneEventIndex
 Will avoid re-indexing Provenance Events because the newest index is
defunct, so it will be re-indexed in the background"}

{"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.index.lucene.LuceneEventIndex
 Determined that Max Event ID indexed for Partition default is
approximately 13082369 based on index
./provenance_repository/lucene-8-index-1649528440360"}

{"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
 The last Provenance Event indexed for partition default is 13072369,
but the last event written to partition has ID 13089683. Re-indexing up
to the last 17314 events for ./provenance_repository to ensure that the
Event Index is accurate and up-to-date"}

{"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
 Finished re-indexing 17315 events across 2 files for
./provenance_repository in 9.713 seconds"}

{"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
 Maximum Threshold for Container default set to 2858730232217 bytes; if
volume exceeds this size, archived data will be deleted until it no
longer exceeds this size"}

{"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
 Initializing FileSystemRepository with 'Always Sync' set to false"}

{"level":"INFO","thread":"Thread-1","message":"org.apache.nifi.NiFi Initiating 
shutdown of Jetty web server..."}

But there are no system error or warning messages.

Despite seeing logs that the webserver is listening, we always get "connection 
refused" when trying to communicate with it.

Thanks,
Eric


On Tue, Apr 19, 2022 at 12:57 PM Mark Payne 
<marka...@hotmail.com<mailto:marka...@hotmail.com>> wrote:
Eric,

I certainly agree with what Joe said. I would also recommend checking in 
nifi.properties if you have a value set for:


nifi.monitor.long.running.task.schedule

I recommend setting that to “9999 hours”

In 1.14.0, we introduced the notion of a Long-Running Task Monitor. It’s 
generally very fast. Typically runs in 10s of milliseconds on my macbook. But 
it relies on JVM-specific code, and we’ve seen in some environments that can 
cause the UI responsiveness to be very adversely affected. We disabled the task 
monitor by default in 1.15, I believe, because of this.

Thanks
-Mark


On Apr 19, 2022, at 3:44 PM, Joe Witt 
<joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:

Eric

When the UI isn't responsive it would be great to have a snapshot of:
- CPU usage at that time
- GC behavior/logging at/around that time.
- IO Utilization around that time
- NiFi Thread dump precisely during it and ideally also one after it responds 
again

NiFi Restarting itself is very interesting of course.  There should be more in 
the app log and bootstrap that will help illuminate the issue then.

Thanks


On Tue, Apr 19, 2022 at 12:42 PM Eric Secules 
<esecu...@gmail.com<mailto:esecu...@gmail.com>> wrote:
By the way, I am running NiFi 1.14.0 and it looks like it keeps restarting 
itself. I am seeing this in the logs about once an hour.

{"level":"INFO","thread":"main","message":"org.apache.nifi.NiFi Controller 
initialization took 4737354582168 nanoseconds (4737 seconds)."}

On Tue, Apr 19, 2022 at 12:34 PM Eric Secules 
<esecu...@gmail.com<mailto:esecu...@gmail.com>> wrote:
Hello,

When my nifi system goes under high load the web UI becomes unresponsive until 
load comes down. Is there a way I can see what's going on (processor status 
summary, queued count, active thread count) when the UI is unresponsive?

The logs are not showing any errors and the various repositories are all 
mounted to separate volumes with elastic capacity so I am sure that none of 
them ran out of space. Our monitoring shows bursts of CPU usage and memory use 
lower than normal.

The logs show that the StandardProcessScheduler stops processors followed by 
starting them, but I never see logs related to the UI being ready to serve. It 
does this about once an hour. I see that the flow is slowly processing based on 
log activity and databases.

How can I see what's going on when the web UI is not responding?

Thanks,
Eric


Reply via email to