I Agree. If we go to production with 3.1.1 we'll definitely need to increase those cache sizes. I didn't increase all of them mainly because I was just looking for any improvement I could find. I would have continued if I had found some.
Unfortunately, for these tests, I don't think those settings are going to help much. I'll still increase them though, if you think they will. These are all brand new users that have never logged into the portal before. Our worst-case scenario is new-user registration, and the testing I'm doing is specifically designed to compare how well uPortal 3.1.1 handles that versus 2.6.1. I'm told we need to be able to support up to 45,000 new logins per hour to accommodate peak load. Here is a 50-user test with 4 tomcats behind the LB with all users that have logged in before, and the caches are populated (even as small as they are). Clearly, 3.1.1 is very competitive with 2.6.1 in that scenario. Since 2.6.1 has more stuff in static caches, I'd be willing to bet that 3.1.1 would be even faster with additional ehcache settings as you note. 3.1.1 -------- Label Samples Average Median 90% Min Max Error % Throughput KB/sec Login Page 4000 456 436 711 30 1424 0 11.14 139.91 Login 4000 512 505 778 64 1546 0 11.14 236.15 Tab 1 4000 291 248 528 33 1415 0 11.14 263.41 Tab 2 4000 397 381 664 35 1451 0 11.14 290.95 Tab 3 4000 420 411 683 49 1436 0 11.14 290.92 Tab 4 4000 390 383 634 42 1429 0 11.14 277.58 Logout 4000 405 394 634 34 1484 0 11.14 139.91 TOTAL 28000 410 396 675 30 1546 0 77.85 1636.64 2.6.1 -------- Label Samples Average Median 90% Min Max Error % Throughput KB/sec Login Page 4000 231 215 391 10 945 0 19.16 37.71 Login 4000 440 414 690 27 14258 0 19.16 53.5 Tab 1 4000 179 154 327 11 1363 0 19.15 112.94 Tab 2 4000 210 185 376 19 1703 0 19.15 170.56 Tab 3 4000 225 199 395 17 994 0 19.16 170.39 Tab 4 4000 219 197 374 16 2077 0 19.16 141.43 Logout 4000 295 280 491 15 1215 0 19.16 37.7 TOTAL 28000 257 221 468 10 14258 0 133.91 723.21 Due to that bug with 2.6.1 I mentioned in my original post, I'm only able to gather results for users that have previously logged into 2.6.1. It isn't an apples-to-apples comparison. If I could get that fixed, and re-run the tests for 2.6.1, I'm sure I'd be closer to saying, "We can replace 2.6.1 with 3.1.1 without having to add additional hardware." For other reasons, we're not able to add new hardware at this time. Do you have any recommendations or intuitions about what we might tune/analyze for new-user logins to make them equal? Thanks, Alex ----- Original Message ----- From: "Eric Dalquist" <eric.dalqu...@doit.wisc.edu> To: uportal-dev@lists.ja-sig.org Sent: Tuesday, June 8, 2010 2:16:59 PM GMT -07:00 U.S. Mountain Time (Arizona) Subject: Re: [uportal-dev] Fwd: uPortal Peformance So in the 3.2.1 ehcache.xml file: https://source.jasig.org/uPortal/tags/rel-3-2-1-GA/uportal-impl/src/main/resources/properties/ehcache.xml There are comments above each cache describing how it is used. I'm curious why you didn't increase cache sizes for all of the caches that pertain to user specific data? For example the "org.jasig.portal.portlet.dao.jpa.PortletEntityImpl" cache has a comment above it that states "1 x subscribed portlet x user". For example we have that cache set to 15000 entries per server here at UW and now that we're getting new hardware and slowly moving from our old 8 server cluster to a 4 server cluster I'll probably increase that even more and we only see peaks of around 100k logins per day. I'd highly recommend going through and for every cache entry in ehcache.xml that has a "x user" component making sure it is sized large enough to actually hold on to all the data your users need. The fact that your PortletEntityImpl has 0 hits, 13000 misses and 1000 entries makes me think it is way too small. Our current stats from one machine which only covers 8080 logins since over the last ~10 hours shows 203784 hits, 109671 misses, 3547 entries in our PortletEntityImpl cache. If memory isn't an immediate concern I'd even say set these cache sizes 100x what you think they need to be set to then run the test. -Eric On 06/08/2010 03:40 PM, Alex Bragg wrote: > OK. I did some tweaking on ehcache.xml. First, I ran a baseline with just a > thousand new users and recorded all the cache numbers. Next, I doubled the > maximum allowed elements for the caches listed below that are marked with a > "*". I didn't see a reason to change the others. The first number is the > hits, the second numbers is misses, and the last is number of elements in the > cache. The first row is the baseline and the second row is after updates. > Long story short, I saw no real difference. The hit ratios were identical. > If anything it got slower. At this short interval, I believe the TTLs have > no real effect. > > Page Elapsed > Response Run > Time (s) Time > Baseline 2.014 3m10s > After Changes 2.063 3m17s > > Hits Misses Objects in cache > PortalStats.org.hibernate.cache.StandardQueryCache > 0 0 0 > 0 0 0 > PortalStats.org.hibernate.cache.UpdateTimestampsCache > 0 0 0 > 0 0 0 > PortalStats.org.jasig.portal.events.EventType > 0 0 0 > 0 0 0 > *org.hibernate.cache.StandardQueryCache > 13000 13007 250 > 13000 13007 500 > org.hibernate.cache.UpdateTimestampsCache > 0 13000 3 > 0 13000 3 > org.jasig.portal.ChannelDefinition > 49037 34 16 > 49037 34 16 > org.jasig.portal.channels.CONTENT_CACHE > 0 0 0 > 0 0 0 > org.jasig.portal.groups.CompositeEntityIdentifier.NAME_PARSE_CACHE > 554915 20 10 > 552151 20 10 > org.jasig.portal.groups.IEntity > 71277 2004 2 > 71002 2004 2 > org.jasig.portal.groups.IEntityGroup > 278460 8 4 > 277078 8 4 > org.jasig.portal.layout.dlm.Evaluator > 0 0 0 > 0 0 0 > org.jasig.portal.layout.dlm.LAYOUT_CACHE > 208092 6049 1 > 208085 6001 1 > org.jasig.portal.portlet.dao.jpa.PortletDefinitionImpl > 8000 0 5 > 8000 0 5 > org.jasig.portal.portlet.dao.jpa.PortletEntityImpl > 0 13000 1000 > 0 13000 1000 > org.jasig.portal.portlet.dao.jpa.PortletPreferenceImpl > 24000 0 9 > 24000 0 9 > org.jasig.portal.portlet.dao.jpa.PortletPreferenceImpl.values > 24000 0 9 > 24000 0 9 > org.jasig.portal.portlet.dao.jpa.PortletPreferencesImpl > 8002 5 1100 > 8002 5 1100 > org.jasig.portal.portlet.dao.jpa.PortletPreferencesImpl.portletPreferences > 8002 13000 1100 > 8002 13000 1100 > *org.jasig.portal.security.IPermissionSet > 219009 3019 1000 > 218037 3020 1005 > *org.jasig.portal.security.provider.AuthorizationImpl.AUTH_PRINCIPAL_CACHE > 349622 2010 150 > 348058 2010 300 > *org.jasig.portal.utils.ResourceLoader.RESOURCE_URL_CACHE > 67869 52032 0 > 67901 52045 22 > *org.jasig.portal.utils.ResourceLoader.RESOURCE_URL_NOT_FOUND_CACHE > 51964 67 0 > 51970 86 16 > org.jasig.portal.utils.cache.ConfigurablePageCachingFilter.PAGE_CACHE > 0 0 0 > 0 0 0 > *org.jasig.services.persondir.USER_INFO.merged > 3003 4003 0 > 3003 4003 1 > *org.jasig.services.persondir.USER_INFO.up_person_dir > 1001 4003 0 > 1001 4003 1 > *org.jasig.services.persondir.USER_INFO.up_user > 1001 4003 0 > 1001 4003 1 > > ----- Original Message ----- > From: "Eric Dalquist"<eric.dalqu...@doit.wisc.edu> > To: uportal-dev@lists.ja-sig.org > Sent: Tuesday, June 8, 2010 7:35:23 AM GMT -07:00 U.S. Mountain Time (Arizona) > Subject: Re: [uportal-dev] Fwd: uPortal Peformance > > I had replied with this on the uportal-user list: > > For the performance my first pointer would be to > uportal-impl/src/main/resources/properties/ehcache.xml > > In each release of uPortal we've been moving more and more data out of > static caches and the user session into Ehcache. I'm not sure its out in > a released version yet but I recently did some review of the default > cache config and tuning here at UW and checked in an updated config file > that at least has comments describing how each cache is used. > > Also all of the cache statistics are available via JMX. I'd recommend > that you monitor those as you're doing your load testing and see which > caches are filling up and which have poor hit rates. Tuning the size and > TTLs of the caches should do a lot to reduce database IO and load times. > > So I guess I'd be very interested to have you do some basic tuning in > ehcache then re-run the tests and watch the caches to see if they are > both large enough and have appropriate TTLs for your usage patterns. > > -Eric > > On 06/08/2010 12:13 AM, Alex Bragg wrote: > >> Hello, >> >> I'm doing some performance testing, and I could use some hints on a couple >> of issues. First, I'm looking for some hints on things I can tweak in >> 3.1.1/3.2.1 to improve performance under heavy load. Second, I'm hitting a >> bug in 2.6.1 that is preventing me from gathering solid baseline performance >> numbers, and perhaps someone else has seen it. Let me explain in further >> detail. >> >> We have been preparing for an upgrade of our production systems from uPortal >> 2.6.1 to uPortal 3.x. Currently, we're looking at two 3.x versions, 3.1.1 >> and 3.2.1. In my development environment, I have installed 2.6.1, 3.1.1, >> and 3.2.1. My 2.6.1 install is running out of a 5.5.28 Tomcat, and my 3.x >> versions are running in a 6.0.24 Tomcat. All versions are running under >> Java version 1.6.0_12-b04, 64-bit, and I have an Oracle 11gR2 database >> backing them. >> >> The layout in each instance is a simple 5-tab layout, with nothing on the >> default tab. I have a custom testing portlet that simply executes a SQL >> query 5, 10, or 15 times and renders a 3-line text output. On the remaining >> four tabs, I have mixtures of two or more of these testing portlets. I run >> tests with JMeter, and the click path is get login page, login, click tab 2, >> click tab 3, click tab 4, click tab 5, and logout. JMeter verifies each >> page renders properly. The tests I run execute this click path 4000 times >> spread across 1, 4, 50, and 200 threads, and there are no waits built into >> the scripts. >> >> Here are results from the tests I have run so far. The values are the 90th >> percentile page-response time in seconds. Please note that the number for >> 2.6.1 in the 200-thread column isn't valid. At the 200-thread level most of >> the 200 threads complete their 20 iterations before JMeter starts additional >> threads during ramp-up. I end up with no more than 4 or 5 threads running >> concurrently. Another thing that skews these numbers is that I can only get >> valid results using users that have successfully logged in before. Anything >> above 2 threads with users that have not previously logged in results in >> channels failing to render (with the message "You are not authorized to view >> this channel"). >> >> version 1 4 50 200 50-lb2 200-lb2 50-lb4 200-lb4 >> 2.6.1 0.07 0.08 0.7 *0.08* 0.69 4.56 >> 3.1.1 0.09 0.09 1.96 7.81 1.18 6.02 1.12 5.49 >> 3.2.1 0.17 0.18 7.04 26.43 6.17 20.22 >> >> The "lb2" and "lb4" designators signify that I have started multiple Tomcats >> on the server, 2 for lb2 and 4 for lb4, and I'm balancing load with HAProxy. >> I see much better utilization on the server, and both page-response times >> and elapsed test run times (below) both improve significantly even though I >> have not added any additional hardware. >> >> This table shows the elapsed time in seconds to complete the above tests. >> >> version 1 4 50 200 50-lb2 200-lb2 50-lb4 200-lb4 >> 2.6.1 934 454 216 212 209.09 263.43 >> 3.1.1 1,537 462 495 813 386.92 660.39 421.41 414.32 >> 3.2.1 3,299 862 1,999 3,958 1259.99 2636.8 >> >> Basically, what I see here is that at low concurrency 2.6.1 and 3.1.1 are >> fairly comparable, and 3.2.1 is noticeably slower. At 50 threads and above, >> I see that 2.6.1 is much faster than 3.x. I also see that at very high >> loads, 3.x seems to have a point where it just falls over the edge of a >> cliff. >> >> Part of that I'm sure is the change in page sizes. Here are the page sizes >> JMeter reports (this does not include embedded resources). >> >> 2.6.1 3.1.1 3.2.1 >> Avg. Bytes Avg. Bytes Avg. Bytes >> Login Page 2014.93 12865 23963 >> Login 2958.61 21716 21909 >> Tab 1 5950.05 24221.21 24656 >> Tab 2 8840.34 26755.38 27430 >> Tab 3 8835.95 26753.3 27428 >> Tab 4 7380.03 25525.27 26068 >> Logout 2014.94 12865 23963 >> TOTAL 5427.84 21528.74 25059.57 >> >> So, back to my two questions. >> >> 1. What has changed in 3.1.1 that might explain a significant (at least 2x >> slowdown under load)? To me it feels like 2.6.1 is caching rendered >> elements to a much greater degree than 3.1.1. What can I tweak to improve >> this? >> >> 2. Is anyone aware of something I can change to fix the behavior with new >> logins in 2.6.1 to prevent this issue with channels not authorized? >> >> Thanks, >> Alex Bragg >> Unicon, Inc. >> >> >> > > -- You are currently subscribed to uportal-dev@lists.ja-sig.org as: arch...@mail-archive.com To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/uportal-dev