To follow up on this issue, I think it might be related to db and/or DAL. + I don't think it relates to exhausting RAM or too many open files. Using lsof to monitor open files during stress test, with 200 concurrent channels (ab), I witnessed up to 11,000 open files (most are apache2). There are several dozens of failed requests, causing by "premature end of script wsgi". What is interesting is that I was able to cause these errors even with only 20 concurrent channels (with only about 3000-4000 open files). This is under postgres.
+ I could not cause the error with sqlite or on a page (controller) that has only 1 db call. + How was I able to cause this error with only 20 concurrent channels? First, I observed that ab is quite simple in that it hits the same page again and again. So I wanted a more realistic test (with more complex behavior). Without any other tools, I decided to do 2 things simultaneously: (1) ab with 20 concurrent channels, and (2) manually (ajax) search for items using the search form on the website; search will perform several db queries which ab does not. Well, the result is that there were several failed requests (resulting from this error) even with only 20 concurrent channels (which is ridiculous). + Another anecdote. I myself experienced this error a number of times while using normally the app (not a result of stress test). Once the error occurred, apache failed to serve the page, of course. What I observed is that when I immediately reload the page, it loads up again very quickly (as normally the case). What this tells me is that the cause of this wsgi error is probably NOT because of the exhaustion of some type of resources (RAM, or opening files, etc.); because that lacking resources was the cause, then there would be some time for the resources to be recovered before the page would quickly be served again. This issue is annoying. Crashing like this is not pleasant from users' point of view. It's clearly related to scalability of web2py. I hope there's an answer to this soon. Here's a typical output of ab with 20 concurrent connections showing failed requests. >>> Finished 254 requests Server Software: Apache/2.2.9 Server Port: 80 Document Path: / Document Length: 10133 bytes Concurrency Level: 20 Time taken for tests: 10.045 seconds Complete requests: 254 Failed requests: 15 (Connect: 0, Receive: 0, Length: 15, Exceptions: 0) Write errors: 0 Non-2xx responses: 15 Keep-Alive requests: 239 Total transferred: 2558415 bytes HTML transferred: 2448532 bytes Requests per second: 25.29 [#/sec] (mean) Time per request: 790.914 [ms] (mean) Time per request: 39.546 [ms] (mean, across all concurrent requests) Transfer rate: 248.74 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 6 16.1 0 56 Processing: 121 748 592.9 631 4312 Waiting: 120 688 589.9 566 4243 Total: 121 755 603.8 631 4365 Percentage of the requests served within a certain time (ms) 50% 631 66% 710 75% 772 80% 804 90% 1670 95% 1893 98% 2788 99% 3929 100% 4365 (longest request) >>>>