Cris,

On 6/1/21 09:17, Berneburg, Cris J. - US wrote:
Hi Chris

[lots of snippage]

cb> One of our web apps is using a "lot" of memory, specifically a big
cb> user query.  We'd like to find out why.
cb> 1. Is there a way to analyze uncollected garbage?
cb> * AWS EC2 instance.
cb> * There are other TC instances on the same server.
cb> * Each TC instance has multiple apps.

cs> What's the goal? Do you just Want To Know, or are you trying
cs> to solve an actual problem.

a. Barely enough memory to distribute among the multiple TC instances
and the apps they support.  A big enough user query (no throttling)
causes OOME's.  Attempting to determine if the code is being wasteful
in some way and therefore could be made more efficient.
What's the database? And the driver?

MySQL Connector/J used to (still does?) read 100% of the results into the heap before Statement.executeQuery() returns unless you specifically tell it not to. So if your query returns 1M rows, you might bust your heap.

It's entirely possible that other drivers do similar things.

b. It's a dev app server (EC2) which hosts diff stages in the dev
process - dev, test, and prototype streams.  Multiple TC instances
(3) because multiple copies of the apps don't play nice with each
other.  That is, we can't just rename the WAR files and expect the
deployed apps to stay inside that context name (I think).
You might want to look into that, eventually. If they aren't playing together nicely, they are not "good" servlet citizens. Solving those issues may improve other things. *shrug*

c. I don't want to debug the code. I'm relatively new to the project, unfamiliar with some of the code, and anticipate getting
lost in the weeds.  See point #1 below.  ;-) >
cs> If you have a bunch of garbage that's not being cleaned up,
cs> usually it's because there is simply no need to do so. The GC
cs> is behaving according to the 3 laws of rob..., er, 3 virtues of
cs> computing[1]:
cs>
cs>    1. Laziness: nothing needs that memory so... meh
cs>    2. Impatience: gotta clean that Eden space quick
cs>    3. Hubris: if I ever need more memory, I know where to find it

cs> [1] http://threevirtues.com/

Ha ha ha!  :-)

cs> How long does the query take to run?

Dunno about the time on the DB query itself.  From the user's point of view, a 
full minute plus.

cs> What kind of query is it? Are we talking about something like SQL

Yup.  Classic RDMS back-end.

cs> or some in-memory database or something which really does
cs> take a lot of memory for the application to fulfill the request?

Nah, nothing that fancy.  The only fancy part is using node.js for the 
front-end.

I followed Amit's and John's suggestion of using Eclipse Memory
Analyzer Tool's "Keep unreachable options" when running a query from
the app client.  Digging deeper into the Leak Suspects Report, I saw
a StringBuilder - 264MB for the supporting byte array and 264MB for
the returned String, about 790MB total for that piece of the pie.
Contents were simply the JSON query results returned to the client.
No mystery there.
Yep: runaway string concatenation. This is a devolution of the "Connector/J reads the whole result set into memory before returning" thing I mentioned above. Most JSON endpoints return arbitrarily large JSON responses and most client applications just go "duh, read the JSON, then process it".

If your JSON is big, well, then you need a lot of memory to store it all if that' who you do things.

If you want to deal with JSON at scale, you need to process it in a streaming fashion. The only library I know that can do streaming JSON is Noggit, which was developed for use with Solr (I think, maybe it came from elsewhere before that). Anyway, it's ... not for the faint of heart. But if you can figure out out, you can handle petabytes of JSON with a tiny heap.

I suspect that repeating the process with multiple queries will
reveal multiple StringBuilder's each containing big honking JSON
results.

Very likely. You might want to throttle/serialize queries you expect to have big responses so that only e.g. 2 of them can be running at a time. Maybe all is well when they come one-at-a-time, but if you try to handle 5 concurrent "big responses" you bust your heap.

The short-term solution is to simply not allow that much concurrency.

So the issue may not be a problem with efficiency so much as one of
simple memory hogging.
This can happen with almost any application. We serve ... many clients simultaneously and we run with a maximum of 10 connections in our db connection pool on each app server. It sounds "too small" but it handles the user-traffic without any user complaints.

(At least StringBuilder is being
used instead of plus-sign String concatenation.)
In Java "..." + "..." uses a StringBuilder (unless the compiler can perform the concatenation at compile-time, in which case there is no concatenation at all). In some code, "..." + "..." is just fine and I hate it when someone replaces it with:

  String foo = new StringBuilder("bar").append("baz").toString();

because the compiler does the _exact same thing_ and you've just made the code more difficult to read.

But if that were in a *loop*, then replacing it with a StringBuilder is pretty important for performance, otherwise the compiler will do something stupid like this:

String foo = "";
for(String s : [ "a", "b", "c", "d" ... ] ) {
  StringBuilder temp = new StringBuilder(foo);
  temp.append(s);
  foo = temp.toString();
}

Anyhow, the use of StringBuilder is essentially a requirement so don't read too much into it being in your heap. You might actually have to start reading some code (shiver!).

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to