HI Mark, On Wed, 13 Nov 2019 at 15:38, Mark Thomas <[email protected]> wrote:
> On 12/11/2019 19:11, M. Manna wrote: > > HI Mark, > > > > following my previous reply, we have now confirmed that it's indeed > 8.5.45 > > with APR 1.2.23 that's causing such high JVM CPU usage. > > We used took out 2 out of 50 servers from the load balancer config, > > reverted tomcat, and redeployed. With near to identical user traffic, the > > two servers are responding normally without/without traffic with 8.5.41. > > The JVM dump looks a lot better with 8.5.41. > > > > We do think that the recent changes in APR and some other tomcat jar may > > have caused compatibility issue on Windows server 2016 (64-bit) platform. > > But unfortunately, we cannot pinpoint exactly what change may have caused > > this (i.e. actual OS vs Security Updates). With this in mind, we are also > > being wary to move to 8.5.47 as we don't know if the same issue will > occur > > again. Since 8.5.41 has been packaged with previously accepted > application > > installer, we are more comfortable rolling back. > > Just to confirm, you see this high CPU usage with a clean install (no > additional web applications deployed, no configuration changes) on > Windows 2016 DataCenter (64-bit)? > > If this is the case, it should be fairly easy to reproduce. > > Mark > > We do not deploy multiple applications. In fact, Under tomcat webapps/ROOT we only have one application (ours). Each tomcat instance is hosted on a VM (total 50) and all of them are identically configured (server.xml, web.xml, logging, CPU/RAM). We have not made any other configuration change between 8.5.41 and 8.5.45. And yes, I agree with you that it's fairly easy to reproduce. Thanks, > > > > > I would appreciate if this can be looked into. > > > > On Tue, 12 Nov 2019 at 11:27, M. Manna <[email protected]> wrote: > > > >> Hey Mark (appreciate your response in US holiday time) > >> > >> On Tue, 12 Nov 2019 at 07:51, Mark Thomas <[email protected]> wrote: > >> > >>> On November 12, 2019 12:54:53 AM UTC, "M. Manna" <[email protected]> > >>> wrote: > >>>> Just to give an update again: > >>>> > >>>> 1) We reverted the APR to 1.2.21 - but observed no difference. > >>>> 2) We took 3 thread dumps over 1 min interval (without any user > >>>> sessions) - > >>>> All threads are tomcat's internal pool threads. > >>>> > >>>> When we checked the thread details (using fasthread.io) - we didn't > see > >>>> any > >>>> of our application stack. Since there is no user traffic, this is > >>>> coming > >>> >from tomcat internally. At this stage, we cannot really figure out > >>>> what's > >>>> the root cause. > >>>> > >>>> Any help is appreciated. > >>> > >>> Migrated from what (full version info please)? > >>> > >> from 8.5.41 to 8.5.45 (we migrate 3 times a year, last was in June) > >> > >>> > >>> Operating system exact version? > >>> > >> Microsoft Windows Server 2016 DataCentre (64-bit) > >> > >>> > >>> JRE vendor and exact version? > >>> > >> C:\jdk1.8.0\bin>java.exe -version > >> java version "1.8.0_162" > >> Java(TM) SE Runtime Environment (build 1.8.0_162-b12) > >> Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode) > >> > >> > >>> Do you see the same behavior with the latest 8.5.x and latest Tomcat > >>> Native? > >>> > >> We are using APR 1.2.23 which I can also see in latest tomcat. Due to > >> production due diligence we cannot roll to a different version that > easily. > >> Normally, we lag behind by 2 monthly releases of tomcat. We also > reverted > >> the APR to 1.2.21 (but no difference). > >> > >>> > >>> What triggers this behaviour? > >>> > >> That is quite strange. Due to US holidays, we had a low traffic on our > >> servers, and nothing has crept in to suggest that it's > application-driven. > >> We took one tomcat instance out of 50 instances and removed all user > >> sessions (i.e. no application activities or threads). Upon restart of > >> tomcat, the CPU spike lingered past the initial servlet startup period. > We > >> monitored that over 1-2 hours but there was no difference. > >> > >>> > >>> How often do you see this behaviour? > >>> > >> We took 2 sets of data > >> 1) 3 Jstack dump based on 10 seconds interval. > >> 2) 3 jstack dump based on 1 min interval. > >> > >> Both the above reveals that all background threads (http, pool etc.) > were > >> from tomcat. We didn't have any application threads lingered in those 3 > >> samples. So yes we see this almost all the time if we take samples. > >> However, when we compared with pre-production instances (with Windows > >> server R2 x64 bit), we don't see such abnormal spike. In fact, the > >> application instance doesn't incur such a big CPU spike. Whilst > composing > >> this email, I am now thinking if the APR is indeed incompatible with > >> WIndows Server R2 (or the presence of any Windows Updates) which blocks > the > >> native poll() call longer than usual. > >> > >> An example is that on Windows Server 2012 - APR poll() call takes about > >> 30% CPU time - but with Windows Server 2016 it's almost always 95%. > >> > >> > >>> > >>> And anything else you think might be relevant. > >>> > >> > >> We are using end-2-end encryption using APR (with Certificate and > >> SSLConfig resource setup in server.xml). But it's survived past 3 tomcat > >> upgrades without any issue. > >> Except OS we don't have any obvious culprit identified at the moment. > >> > >> Thanks, > >> > >>> > >>> Mark > >>> > >>>> > >>>> Thanks, > >>>> > >>>> On Mon, 11 Nov 2019 at 20:57, M. Manna <[email protected]> wrote: > >>>> > >>>>> Hello All, > >>>>> > >>>>> Any thoughts regarding this? Slightly clueless at this point, so any > >>>>> direction will be appreciated. > >>>>> > >>>>> We are seeing the poll taking all the CPU time. We are using > >>>>> OperatingSystemMXBean.getProcessCpuLoad() and > >>>>> OperatingSystemMXBean.getSystemCpuLoad() to get our metrics (then > >>>> x100 to > >>>>> get the pct). > >>>>> > >>>>> Thanks, > >>>>> > >>>>> > >>>>> On Mon, 11 Nov 2019 at 17:46, M. Manna <[email protected]> wrote: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> after migrating to 8.5.45, we are seeing a lot of cpu load by > >>>> following > >>>>>> JVM thread dump: > >>>>>> > >>>>>> "https-openssl-apr-0.0.0.0-8443-Poller" : 102 : RUNNABLE : > >>>>>> cpu=172902703125000 : cpuLoad= 74.181015 > >>>>>> > >>>>>> BlockedCount:8464 BlockedTime:0 LockName:null LockOwnerID:-1 > >>>>>> LockOwnerName:null > >>>>>> > >>>>>> WaitedCount:5397 WaitedTime:0 InNative:false IsSuspended:false at > >>>>>> org.apache.tomcat.jni.Poll.poll(Poll.java:-2) > >>>>>> > >>>>>> at > >>>>>> > >>>> org.apache.tomcat.util.net > .AprEndpoint$Poller.run(AprEndpoint.java:1547) > >>>>>> > >>>>>> at java.lang.Thread.run(Thread.java:748) > >>>>>> > >>>>>> > >>>>>> These are coming after 2-3 successful jvm dump. Is this something > >>>>>> familiar to anybody? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>> > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [email protected] > >>> For additional commands, e-mail: [email protected] > >>> > >>> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
