On 12/20/20 10:26 PM, Hrafn Malmquist wrote:
Hi Gary

Thanks for taking the time to respond.

I hope you can bear with me as I am still learning about database
connection pooling.

Perhaps I did not ask the question correctly. I am not asking about a site
specific setup but rather what defaults should be shipped with the
software. I am part of the minor version release team.

Currently, the default setup is a DBCP2 v. 2.1.1 connection pool with
only maxWaitMillis,
maxIdle and maxTotal configurable in the DSpace configuration settings and
the default values for these settings set to 5000, 10 and 30 respectively.
It's unclear why these defaults were chosen to begin with, git blame shows
they were chosen back in 2015. I don't think a lot of thought went into
choosing 1) which parameters should be configurable nor 2) what their
defaults should be (or why they should differ from DBCP2 defaults).

DSpace repositories are run by higher education institutions and all sorts
of institutions and organisations involved in research, for instance the
Smithsonian (https://repository.si.edu/). Therefore, although the vast
majority of instances are run by small institutions that get little
traffic, others are likely to receive relatively heavy traffic, from users
and crawlers.

So the idea is to ask the experts what parameters should be configurable
for the average repository admin, keeping in mind that the aim is for
installation and setup to be simple (in effect, what are the "main"
parameters likely to need tweaking) and what should the out-of-the-box
defaults be (if at all different from the DBCP2 defaults).

I am particularly surprised at the low maxWaitMillis chosen. Is that not
likely to cause problems for high traffic sites?

I would say no.  Having threads blocked waiting for connections for longer than 5 seconds will likely cause problems in heavily loaded applications.  You will end up running out of app server processing threads if they are hanging for that long.   If getConnection is taking that long, there is likely a problem somewhere in the overall system - processing threads holding connections too long, not enough connections, database latency, etc.  It all comes down to queuing theory.  If your app does not hold connections long and queries are optimized, even a relatively small pool can handle decent load.  The key is to not to leave connections open or hold on to them too long.

The defaults above look OK to me, though if database connections are not in short supply, I would bump maxIdle to 20.  The reason for this is that setting it at 10 means that if the number used regularly goes up to 20+, you will end up with a lot of connection churn.  On the other hand, if the usage pattern is spikes now and then followed by long periods of lighter load, setting it at 20 will "waste" some connections.  How important that "waste" is depends on what else is going on in the DB, how many pools are sharing it, etc.

I would recommend upgrading to the latest version compatible with the version of tc you are running, or simply using the version that ships with tomcat (which is generally the latest compatible). Another reason to upgrade dbcp if you are using it directly is to pick up the fixes in the later version of commons pool that it brings in.

For some general info on how dbcp and pool configs work, see [1]. It is old, but the basic concepts are still correct.  If you are familiar with queuing theory, you can view a pool with n connections as a M/M/n queue.  What drives everything is request arrival rate and service time, which in the case of dbcp is how long an application thread holds a connection.   You can observe actual utilization using the JMX interfaces.

Phil

[1] https://www.slideshare.net/psteitz/apachecon2014-pooldbcp

Best regards, Hrafn


[1] :
https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/spring/api/core-hibernate.xml#L41-L48

[2] :
https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/dspace.cfg#L77-L86

On Sun, Dec 20, 2020 at 6:40 PM Gary Gregory <[email protected]> wrote:

Hi,

Each new DBCP release brings fixes, additions,  and other updates, as you
can read in the release notes.

How to best configure DBCP for any given combination of JDBC driver, its
database, and application will be quite variable, which is somewhat out of
scope here IMO.

Gary

On Fri, Dec 18, 2020, 11:15 Hrafn Malmquist <[email protected]>
wrote:

Good day

I'm wondering what are optimal defaults for DSpace, open source digital
repository software aimed especially at  academic, non-profit, and
commercial organizations (see https://duraspace.org/dspace/).

DSpace supports both Postgres and Oracle and recommends Tomcat, Jetty or
Caucho Resin. I suspect 9/10 installations use Tomcat.

DSpace comes packaged with Apache Commons DCBP 2.1.1. DSpace only
configures three configurations for DBCP2 using non-default settings.
(see:
[1] and [2])

These are
maxTotal = 30
maxIdle = 10
maxWaitMillis = 5000

I am not sure what reasoning is behind the choice of these configuration
settings. DSpace is used by all sorts of institutions, some receiving
very
high traffic. My guess is that using the DBCP2 defaults is recommended.
My
question is, is this a good default configuration? Should there be more
configuration configurable by DSpace users in the DSpace config? There
have
been reports of the database not being reachable because of too many idle
connections. According to one doc [3] maxWaitMillis should be at a
minimum of 10000 ms if I understand correctly.

Also, I assume there are benefits to upgrading the DBCP2 dependency to
the
most recent version, 2.8.0. I'm not sure what the major benefits are
though. I can see v. 2.5.0 only runs on Java 8.

[1] -


https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/local.cfg.EXAMPLE#L88-L99
[2] -


https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/spring/api/core-hibernate.xml#L46-L48
[3] -


https://tomcat.apache.org/tomcat-8.0-doc/jndi-datasource-examples-howto.html#Intermittent_Database_Connection_Failures


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to