Hi Joe & Mark,

I'll work on getting those JVM stats. But the instance is locked down
pretty tight so getting GC, and Thread dump will be difficult.

We renamed the flow.xml.gz file and restarted with the hope of seeing some
UI. The overall CPU usage is lower, but that is not helping the UI.

Bootstrap log says:

> /opt/nifi/nifi-current/logs$ tail nifi-bootstrap.log
> 2022-04-19 19:43:03,104 INFO [main] o.a.n.b.NotificationServiceManager
> Successfully loaded the following 0 services: []
> 2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> Registered no Notification Services for Notification Type NIFI_STARTED
> 2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> Registered no Notification Services for Notification Type NIFI_STOPPED
> 2022-04-19 19:43:03,107 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> Registered no Notification Services for Notification Type NIFI_DIED
> 2022-04-19 19:43:03,143 INFO [main] org.apache.nifi.bootstrap.RunNiFi
> Runtime Java version: 1.8.0_265
> 2022-04-19 19:43:03,154 INFO [main] org.apache.nifi.bootstrap.Command
> Starting Apache NiFi...
> 2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command
> Working Directory: /opt/nifi/nifi-current
> 2022-04-19 19:43:03,155 INFO [main] org.apache.nifi.bootstrap.Command
> Command: /usr/local/openjdk-8/bin/java -classpath
> /opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/nifi-nar-utils-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-runtime-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-framework-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/logback-classic-1.2.3.jar:/opt/nifi/nifi-current/./lib/javax.servlet-api-3.1.0.jar:/opt/nifi/nifi-current/./lib/slf4j-api-1.7.30.jar:/opt/nifi/nifi-current/./lib/log4j-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/jul-to-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/nifi-properties-1.12.1.jar:/opt/nifi/nifi-current/./lib/nifi-api-1.12.1.jar:/opt/nifi/nifi-current/./lib/jcl-over-slf4j-1.7.30.jar:/opt/nifi/nifi-current/./lib/logback-core-1.2.3.jar:/opt/nifi/nifi-current/./lib/jetty-schemas-3.1.jar
> -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx14g -Xms12g
> -Dcurator-log-only-first-connection-issue-as-error-level=true
> -Djavax.security.auth.useSubjectCredsOnly=true
> -Djava.security.egd=file:/dev/urandom -Dzookeeper.admin.enableServer=false
> -Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true
> -Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol
> -Dnifi.properties.file.path=/opt/nifi/nifi-current/./conf/nifi.properties
> -Dnifi.bootstrap.listen.port=34211 -Dapp=NiFi
> -Dorg.apache.nifi.bootstrap.config.log.dir=/opt/nifi/nifi-current/logs
> org.apache.nifi.NiFi
> 2022-04-19 19:43:08,517 INFO [main] org.apache.nifi.bootstrap.Command
> Launched Apache NiFi with Process ID 123
> 2022-04-19 19:43:13,834 INFO [NiFi Bootstrap Command Listener]
> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for
> Bootstrap requests on port 44417


Mark, our nifi.properties has no value set for that property, so we are
using whatever the default is i guess. Since it's 1.14.0 we will take your
suggestion and set it to a long time.

CPU taken by nifi is 37% (3.5 CPU cores) and 22.2% memory the jvm was
started with -Xmx14g -Xms12g
Disk IO is happening, but not at a high rate.

App log contains these messages before the Jetty shutdown message

> {"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
>  Loaded 113 properties from system properties and environment
> variables"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.r.v.FileBasedVariableRegistry
>  Loaded a total of 113 properties.  Including precedence overrides
> effective accessible registry key size is 113"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.p.store.WriteAheadStorePartition
>  After recovering ./provenance_repository, next Event ID to be generated
>  will be 13089684"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.p.index.lucene.LuceneEventIndex
>  Will avoid re-indexing Provenance Events because the newest index is
> defunct, so it will be re-indexed in the background"}
>
> {"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.index.lucene.LuceneEventIndex
>  Determined that Max Event ID indexed for Partition default is
> approximately 13082369 based on index
> ./provenance_repository/lucene-8-index-1649528440360"}
>
> {"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
>  The last Provenance Event indexed for partition default is 13072369,
> but the last event written to partition has ID 13089683. Re-indexing up
> to the last 17314 events for ./provenance_repository to ensure that the
> Event Index is accurate and up-to-date"}
>
> {"level":"INFO","thread":"pool-20-thread-1","message":"o.a.n.p.store.WriteAheadStorePartition
>  Finished re-indexing 17315 events across 2 files for
> ./provenance_repository in 9.713 seconds"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
>  Maximum Threshold for Container default set to 2858730232217 bytes; if
> volume exceeds this size, archived data will be deleted until it no
> longer exceeds this size"}
>
> {"level":"INFO","thread":"main","message":"o.a.n.c.repository.FileSystemRepository
>  Initializing FileSystemRepository with 'Always Sync' set to false"}
>
> {"level":"INFO","thread":"Thread-1","message":"org.apache.nifi.NiFi 
> Initiating shutdown of Jetty web server..."}
>
>
But there are no system error or warning messages.

Despite seeing logs that the webserver is listening, we always get
"connection refused" when trying to communicate with it.

Thanks,
Eric


On Tue, Apr 19, 2022 at 12:57 PM Mark Payne <marka...@hotmail.com> wrote:

> Eric,
>
> I certainly agree with what Joe said. I would also recommend checking in
> nifi.properties if you have a value set for:
>
> nifi.monitor.long.running.task.schedule
>
> I recommend setting that to “9999 hours”
>
> In 1.14.0, we introduced the notion of a Long-Running Task Monitor. It’s
> generally very fast. Typically runs in 10s of milliseconds on my macbook.
> But it relies on JVM-specific code, and we’ve seen in some environments
> that can cause the UI responsiveness to be very adversely affected. We
> disabled the task monitor by default in 1.15, I believe, because of this.
>
> Thanks
> -Mark
>
>
> On Apr 19, 2022, at 3:44 PM, Joe Witt <joe.w...@gmail.com> wrote:
>
> Eric
>
> When the UI isn't responsive it would be great to have a snapshot of:
> - CPU usage at that time
> - GC behavior/logging at/around that time.
> - IO Utilization around that time
> - NiFi Thread dump precisely during it and ideally also one after it
> responds again
>
> NiFi Restarting itself is very interesting of course.  There should be
> more in the app log and bootstrap that will help illuminate the issue
> then.
>
> Thanks
>
>
> On Tue, Apr 19, 2022 at 12:42 PM Eric Secules <esecu...@gmail.com> wrote:
>
>> By the way, I am running NiFi 1.14.0 and it looks like it keeps
>> restarting itself. I am seeing this in the logs about once an hour.
>>
>> {"level":"INFO","thread":"main","message":"org.apache.nifi.NiFi
>>> Controller initialization took 4737354582168 nanoseconds (4737 seconds).
>>> "}
>>>
>>
>> On Tue, Apr 19, 2022 at 12:34 PM Eric Secules <esecu...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> When my nifi system goes under high load the web UI becomes unresponsive
>>> until load comes down. Is there a way I can see what's going on (processor
>>> status summary, queued count, active thread count) when the UI is
>>> unresponsive?
>>>
>>> The logs are not showing any errors and the various repositories are all
>>> mounted to separate volumes with elastic capacity so I am sure that none of
>>> them ran out of space. Our monitoring shows bursts of CPU usage and memory
>>> use lower than normal.
>>>
>>> The logs show that the StandardProcessScheduler stops processors
>>> followed by starting them, but I never see logs related to the UI being
>>> ready to serve. It does this about once an hour. I see that the flow is
>>> slowly processing based on log activity and databases.
>>>
>>> How can I see what's going on when the web UI is not responding?
>>>
>>> Thanks,
>>> Eric
>>>
>>
>

Reply via email to