I guess I'll start out with my setup. Four XServe running 10.4.6. All dual processor G5s and all completely patched with all software updates. (including the latest WO and Java updates).
Two of these servers are running as App servers. The other two are running OpenBase and are strictly database servers. This system runs 3 WebObjects applications. One is a backend management application for the system. It only has one instance running and isn't used very often. The second app is essentially a spider. It searches the web collecting data relevant to our customer's needs. The last app is a service provider. It is basically just a direct action that returns data based on a customer's request. The first app has one instance running and may get 10 request a day. The second app has one instance running. It feeds off a queue in the database that tells it what information it needs and does not have. A cron job kicks off a direct action that runs through this queue and looks for the required data and stores it into the system. It has a 55 minute timer that shuts itself off if it doesn't empty the queue within 55 minutes. The action is fired again 5 minutes later and picks up where the last one left off. This app currently runs all 55 minutes because we've recently cleared a lot of data out of the system that we need to refresh with new data. The third app does get a substantial amount of traffic. For instance we got 2,941,750 requests to that app in the last 24 hours alone. About 3 months ago we started having performance issues. I pretty much pinned down the issues to the fact that one database server wasn't adequate for the system. So about 3 weeks ago server #4 arrived and was set it up as a database server and dropped it into the system. We redesigned the system to split the data up so we could run approximately half the data on one server and half on the other (approximately). That system was deployed and ran fairly well for about 2 weeks. Last week I started getting hanging requests and the system was grinding to a halt. It wasn't crashing mind you, just taking way to long to service requests and occasionally dropping requests here and there. I made a change to application #2, which is the one that pretty much runs constantly for 55 minutes out of each hour. I added a sleep of 10 milliseconds. I had come to the conclusion that this long running app was monopolizing the cpu and causing the request for the other apps to stack up and bog things down. Adding a sleep of 10 milliseconds in the main loop seemed to have fix our problems for about 4 days. I must admit that this was a guess. I didn't have any real proof that this app was monopolizing resources. I also didn't know if sleeping the long running thread for 10 milliseconds would have the desired result. But it seemed to work. Of course it could have just been a coincidence. A little over a week ago. The app servers where running at an average of about 15%-25% cpu all day. Today, they are running at a constant 50%-60%. Looking in "Server Admin" at the number of request per second for Apache. There doesn't appear to be a large increase in web traffic. Although I'm not entirely confident in Server Admins Apache statistics considering it seems to measure the total number of requests active in the system and not simply the total number of request that have come in at that time. The database servers (running OpenBase) don't show any wierd activity. They run at a constant 5%-10% which leads me to believe they aren't running anywhere near capacity. Looking at the detail view of app #3 in the web monitor doesn't seem to reveal any statistics that look different than any other day. The total number of request don't seem much larger than normal. The average request time doesn't seem abnormal and niether does average idle time. The application logs don't indicate a bunch of exceptions being thrown. Outside of Server Admin, the web monitor and logs; I can't think of where else to look for the problem. The latest version of this software has been running steady for about a month now with about the same amount of traffic I've getting today. But it seems to be running really high CPU percentages for some reason. App #3 is running with 128 MB of RAM, which is the same amount it's been running at for over 1 and a half years now. The application is set for a minimum of 2 threads and a maximum of 20 threads with a listening queue size of 1. The adaptor has the following settings: Load balance scheme: round robin. Retries: 1 Dormant: 1 Send timeout: 10 Receive timeout: 10 Connect timeout: 5 Send Buffer size: 32768 Receive buffer size: 98304 (the response pages are between 60K and 100K) Connection Pool Size: 1 I guess my question is where, outside of what I've already stated, should I start to look for my bottleneck? Are there some tools in WO or XServe that I haven't mentioned yet, that I should be looking at? I think I (and Robert Walker who has been assisting me with this project so far) have done a pretty good job with things so far. But I guess I've hit the edge of my knowledge because I can't seem to locate the problem I'm having. Any help would be greatly appreciated. - Eric Stewart _______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-deploy mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-deploy/archive%40mail-archive.com This email sent to [email protected]
