Hi Mark,

See comments below.


> -----Original Message-----
> From: Mark Thomas <ma...@apache.org>
> Sent: Tuesday, May 28, 2024 9:32 AM
> To: Tomcat Users List <users@tomcat.apache.org>
> Subject: Re: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hi Eric,
>
> Follow-up observsations and comments in-line.
>
> >> What time does this problem start?
> >
> > It typically starts around 9:15 am EDT and goes until around 10:30 am.
>
> Does that match the time of highest request load from the customer?
> Rather than a spike, I'm wondering if the problem is triggered once load
> exceeds some threshold.
>

My nginx proxy console only shows live activity and does not keep a history, 
but I can probably script something to parse the localhost_access logs and 
graph request counts on a per-minute basis. Will work on that.

> > We finished and implemented the script yesterday, so today will be the first
> day that it produces results. It watches the catalina.out file for stuck 
> thread
> detection warnings. When the number of stuck threads exceeds a threshold,
> then it starts doing thread dumps every 60 seconds until the counts drops back
> down below the threshold. The users typically do not complain of slowness 
> until
> the stuck thread count exceeds 20, and during that time the threads often take
> up to a minute or more to complete. It's too late today to change the timings,
> but if it does not produce any actionable intel, we can adjust them tonight.
>
> Lets see what that produces and go from there.
>

Took a bunch of thread and heap dumps during today's painful debacle. Will send 
a link to those as soon as I can.

> > The vendor claims that the feature uses a different server and does not send
> requests to the slow ones, so it has been re-enabled at the customer's 
> request.
> We may ask them to disable it again until we get this issue resolved.
>
> Noted.
>
> > This customer sends about 1.5 million requests to each load-balanced
> > server during a typical production day. Most other customers send much
> > less, often only a fraction of that. However, at least one customer
> > sends about 2 million to the same server, and they don't see the
> > problem. (I will check if they have the AI feature enabled.)
>
> Hmm. Whether that other customer has the AI feature enabled would be an
> interesting data point.

I will ask them right after I send this message. They are usually a little slow 
to respond.

>
> >> Can we see the full stack trace please.
> >
> > Here's one example.
>
> <snip/>
>
> >          java.lang.Throwable
> >                  at
> org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoa
> derBase.java:1252)
> >                  at
> > org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClass
> > LoaderBase.java:1220)
>
> <snip/>
>
> That is *very* interesting. That is the start of a synch block in the class 
> loader. It
> should complete quickly. The full thread dump should tell us what is holding 
> the
> lock. If we are lucky we'll be able to tell why the lock is being held for so 
> long.
>
> We might need to reduce the time between thread dumps to figure out what
> the thread that is blocking everything is doing. We'll see.
>
> > The app has DB connection details in two places. First, it uses a database
> connection string in a .properties file, as follows. This string handles most
> connections to the DB.
> >
> > mobiledoc.DBUrl=jdbc:mysql://ha52a:5791
> >
> mobiledoc.DBName=mobiledoc_791?useSSL=false&zeroDateTimeBehavior=rou
> nd
> >
> &jdbcCompliantTruncation=false&cacheServerConfiguration=true&dumpQueri
> > esOnException=true&tinyInt1isBit=false&useOldAliasMetadataBehavior=tru
> > e&dontTrackOpenResources=true
> > mobiledoc.DBUser=<redacted>
> > mobiledoc.DBPassword=<redacted>
>
> OK. That seems unlikely to be using connection pooling although the 
> application
> might be pooling internally.
>

Based on lots of previous observation, I don't think they are. The comms 
between the app and DB are choppy, with only about 1-5 queries per TCP 
connection. If they are pooling, they are not doing it aggressively.

> > It also has second DB config specifically for a drug database.
> >
> > <MediSpan>
> >    <!-- Log settings -->
> >    <Logging enabled="false">
> >      <SQLQuery enabled="true" />
> >      <ConstructorCall enabled="false" />
> >      <FunctionCall enabled="false" />
> >      <CustomLogFileLocation>c:\out.log</CustomLogFileLocation>
> >    </Logging>
> >    <!-- Medispan Content definitions -->
> >    <DataSource id="ecw" DataBase="medispan" />
> >    <!-- DataSources -->
> >    <DataBase id="medispan">
> >      <JNDIEnabled enabled="false" />
> >
> <INITIAL_CONTEXT_FACTORY>INSERT_CONTEXT_FACTORY</INITIAL_CONTEXT_
> FACTORY>
> >      <JNDI_URL>INSERT_JNDI_URL</JNDI_URL>
> >      <JNDI_USER_NAME>INSERT_USER_NAME</JNDI_USER_NAME>
> >      <JNDI_PASSWORD>INSERT_PASSWORD</JNDI_PASSWORD>
> >      <LOOKUP_NAME>INSERT_LOOKUP_NAME</LOOKUP_NAME>
> >      <DRIVER>com.mysql.jdbc.Driver</DRIVER>
> >
> <URL>jdbc:mysql://dbclust54:5791/medispan?sessionVariables=wait_timeout=2
> 8800,interactive_timeout=28800</URL>
> >      <USER_NAME>redacted</USER_NAME>
> >      <PASSWORD>redacted</PASSWORD>
> >      <POOLSIZE>10</POOLSIZE>
> >      <TIMEOUT>5000</TIMEOUT>
> >    </DataBase>
> >    <XMLContent id="DIB310" dir="INSERT_CONTENT_DIR" />
> >    <Images id="DIB310" dir="INSERT_IMAGE_DIR" />
> >    <Cache>
> >      <Enabled>true</Enabled>
> >      <SecondsTillStale>0</SecondsTillStale>
> >      <LastUpdateInterval>1800</LastUpdateInterval>
> >    </Cache>
> > </MediSpan>
>
> Hmm. There is a pool size setting there but we can't tell if it is being used.
>
> >> Is that Tomcat 9.0.80 as provided by the ASF?
>
> An explicit answer to this question would be helpful.
>

Didn't mean to seem evasive. Yes, it's from the ASF.


> In terms of the way forward, we need to see to thread dumps when the problem
> is happening to figure out where the blockage is happening and
> (hopefully) why.
>
> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

Reply via email to