Random collection of thoughts on this topic:
* Raw events would need more than one connection. There are 3 major interactions (might be more) with UP_RAW_EVENTS: Insert for new events, read/update for aggregation, and purging. * This pool is per node, so if you are clustering (like our 4 nodes) this can get you into trouble if you have limits on your concurrent database connections. * It looks like we use the default pool size here on MyUW. I don't have access to the admin interface in production. * The only time we had database connection issues is when we had a database hiccup (single point of failure) and a bunch of requests queued up. * If we do modifications, I would also suggest upping the wait for connection timeout. Thanks, Tim Levett tim.levettATwisc.edu MyUW-Infrastructure ________________________________ From: bounce-37985837-70367...@lists.wisc.edu <bounce-37985837-70367...@lists.wisc.edu> on behalf of James Wennmacher <jwennmac...@unicon.net> Sent: Tuesday, December 2, 2014 1:02 PM To: uportal-dev@lists.ja-sig.org Subject: [uportal-dev] uPortal database connection pool size Answering the question below on the uportal-user list got me thinking (a dangerous thing indeed ... :-) ). Currently all 3 of the uPortal DB connection pool sizes defined in datasourceContext.xml are all set to the same max value (of 75 by default). In glancing at the code I am thinking that the raw events DB pool and the aggregation events DB pool are running on timed threads and only use 1 DB connection each (see https://github.com/Jasig/uPortal/blob/uportal-4.1.2/uportal-war/src/main/java/org/jasig/portal/events/handlers/QueueingEventHandler.java#L41 and https://github.com/Jasig/uPortal/blob/uportal-4.1.2/uportal-war/src/main/resources/properties/contexts/schedulerContext.xml#L73). They can both have a smaller maxActive value to limit the exposure of an error somehow consuming large numbers of DB connections and impacting uPortal and portlets (via consuming too many DB connections on the DB server). I was thinking of setting their maxActive value to 5 to allow simultaneous threads for saving, purging, and querying. Does anyone see a problem with this strategy? UWMadison or someone with an active system can you glance at the DB MBeans AggrEventsDB and RawEventsDB in uPortal/Datasource and see if the NumActive + NumIdle are even close to 5 on your system? (unfortunately without monitoring tools I don't see how you'd find out what the max # of connections ever made was). Thanks in advance for your insights and thoughts. James Wennmacher - Unicon 480.558.2420 -------- Forwarded Message -------- Subject: Re: [uportal-user] Increase database connection pool size Date: Tue, 02 Dec 2014 11:02:02 -0700 From: James Wennmacher <jwennmac...@unicon.net><mailto:jwennmac...@unicon.net> To: uportal-u...@lists.jasig.org<mailto:uportal-u...@lists.jasig.org> Database connection counts are defined in uportal-war/src/main/resources/properties/contexts/datasourceContext.xml. uPortal uses the DB connections for a fairly brief period of time. The message 'none available[size:75; busy:0; idle:0; lastwait:5000]' plus your comment about leaving it overnight makes me wonder if somehow the connections are being lost and not reclaimed. I suggest: 1. Insure that the load test is not hitting servers too heavily; e.g. load is distributed evenly. I could see running out of DB connections happening if a server gets hammered (though the connections should be freed up at some point later). Does it happen primarily to one or two servers and not all of them? 2. Try adding the following properties to the basePooledDataSource bean in datasourceContext.xml: <property name="logAbandoned" value="true" /> <property name="numTestsPerEvictionRun" value="5" /> This may not resolve the issue, but perhaps the logging will provide a clue to what's going on. However it is likely the additional logging will not trigger. The property minEvictableIdleTimeMillis is supposed to release a connection after it has been idle for the specified number of milliseconds, and the properties abandonWhenPercentageFull, removeAbandoned, and removeAbandonedTimeout which are specified are supposed to clean up abandon connections (allocated but not used in removeAbandonedTimeout seconds when a new connection is requested but none are available). However in a load test scenario, especially one where a server is taxed very heavily, the removeAbandonedTimeout value may be too high (value is 300 sec) if connections are heavily used so no connections may be considered abandoned and harvested during the test. However I wouldn't change removeAbandonedTimeout just yet. After the test completes however there may be some useful log messages if some connections were consumed and not released. If nothing else 5 minutes after the test completes you should be able to log onto the server even if all connections are consumed since it should consider at least some of the connections as abandoned and eligible for harvesting. However your comment about leaving the system overnight and the issue still exists makes me think the abandon connections will not be harvested. Still worth trying logAbandoned to see if it provides more info. 3. Are there other DB connection error messages? The ones you mentioned are for event aggregation (runs periodically to aggregate and purge raw portal activity event data) and for jgroups (used for distributed cache management to allow uPortal nodes to notify other uPortal nodes about cache replication or invalidation). Were there any for uPortal activity not having a database connection? 4. It would be great to get more information so we can try to fix the issue of the connections not being released. Even when fully consumed, the connections should release after a period of time (after being idle for minEvictableIdleTimeMillis milliseconds, or 5 minutes at the latest per removeAbandonedTimeout property when attempting to get a new connection and none are available). When this situation occurs are you able to look at the DB server and see if the DB sees the 75 connections from the failed uPortal server, if the DB thinks the connections are idle, and what the last SQL command was on each of the DB connections? It is also possible that network issues between the uPortal server and DB server are causing network socket connections to hang. Finding out if the DB server is aware of the connections, their state, and what the last activity was should help determine if that is the case and hopefully point us to where the issue is. 5. Barring additional investigation above (which I'd really like to have investigated and addressed), if you decide to try and increase the DB connections you'll want to discuss with your DBA. Each uPortal server will make the up to 3 times the specified number of connections (75 for uPortal app use, 75 for raw event storage, 75 for event aggregation), plus some portlets (newsreader, announcements, simple content portlet, calendar, bookmarks) have separate db connection pools or make DB connections on each request that will go to the same DB. If you have 10 servers, assuming they each make 75 + another 30 to 50 connections for the portlets (making a guess at max portlet connections), the max calculation would be your DB server would need to have resources to handle 10 * (3 * 75 + 30) DB connections. As a note I don't think that the raw events or the aggregation events DB pools are likely to use 75 DB connections each as I think they are threads that run periodically on a timer and would use only 1 or 2 DB connections each (barring some software fault) even though their pool sizes are a max of 75. If I'm right the real calculations would be more like 10 * (75 + 2 + 30), though it is likely the portlets would be less likely to max out their connections unless they are all on the main landing page or otherwise close together in the page flow. In light of above it is possible that part of what is going on is that uPortal is attempting to request a DB connection, but the DB server is maxed out and it rejects the open. I'm not sure if that's what is going on but it is worth investigating. I hope this helps, and please let us know what you find out. Thanks, James Wennmacher - Unicon 480.558.2420 On 12/02/2014 08:46 AM, Ryan Melissari wrote: I am in the process of load testing uPortal and am running out of database connections. I have looked and don't see where I would increase this. From the log file it is set to 75...does anyone know what a good number to increase this to would be? Also, it seems that once it uses all the connections, it never releases them. I have left it overnight and it never gives them back, forcing me to restart tomcat. Is there a way to set a max wait as well? Here is the error I am getting in the portal.log: INFO [uP-TaskExec-7-aggregateRawEvents] o.h.e.i.DefaultLoadEventListener 2014-12-02 09:27:54,147 - HHH000327: Error performing load command : org.hibernate.exception.GenericJDBCException: Could not open connection ERROR [Timer-5,uPortal.cacheManager,htst2web1-56682] o.j.p.jgroups.protocols.DAO_PING 2014-12-02 09:27:56,126 - failed sending discovery request org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is org.apache.tomcat.jdbc.pool.PoolExhaustedException: [Timer-5,uPortal.cacheManager,htst2web1-56682] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:75; busy:0; idle:0; lastwait:5000]. -- You are currently subscribed to uportal-u...@lists.ja-sig.org<mailto:uportal-u...@lists.ja-sig.org> as: jwennmac...@unicon.net<mailto:jwennmac...@unicon.net> To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/uportal-user -- You are currently subscribed to uportal-dev@lists.ja-sig.org as: tim.lev...@wisc.edu To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/uportal-dev -- You are currently subscribed to uportal-dev@lists.ja-sig.org as: arch...@mail-archive.com To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/uportal-dev