To follow up on this issue, I think it might be related to db and/or
DAL.

+ I don't think it relates to exhausting RAM or too many open files.
Using lsof to monitor open files during stress test, with 200
concurrent channels (ab), I witnessed up to 11,000 open files (most
are apache2).  There are several dozens of failed requests, causing by
"premature end of script wsgi".  What is interesting is that I was
able to cause these errors even with only 20 concurrent channels (with
only about 3000-4000 open files).  This is under postgres.

+ I could not cause the error with sqlite or on a page (controller)
that has only 1 db call.

+ How was I able to cause this error with only 20 concurrent
channels?   First, I observed that ab is quite simple in that it hits
the same page again and again.  So I wanted a more realistic test
(with more complex behavior).  Without any other tools, I decided to
do 2 things simultaneously: (1) ab with 20 concurrent channels, and
(2) manually (ajax) search for items using the search form on the
website; search will perform several db queries which ab does not.

Well, the result is that there were several failed requests (resulting
from this error) even with only 20 concurrent channels (which is
ridiculous).

+ Another anecdote.  I myself experienced this error a number of times
while using normally the app (not a result of stress test).  Once the
error occurred, apache failed to serve the page, of course.  What I
observed is that when I immediately reload the page, it loads up again
very quickly (as normally the case).   What this tells me is that the
cause of this wsgi error is probably  NOT because of the exhaustion of
some type of resources (RAM, or opening files, etc.); because that
lacking resources was the cause, then there would be some time for the
resources to be recovered before the page would quickly be served
again.


This issue is annoying.  Crashing like this is not pleasant from
users' point of view.  It's clearly related to scalability of web2py.
I hope there's an answer to this soon.


Here's a typical output of ab with 20 concurrent connections showing
failed requests.

>>>
Finished 254 requests


Server Software:        Apache/2.2.9
Server Port:            80

Document Path:          /
Document Length:        10133 bytes

Concurrency Level:      20
Time taken for tests:   10.045 seconds
Complete requests:      254
Failed requests:        15
   (Connect: 0, Receive: 0, Length: 15, Exceptions: 0)
Write errors:           0
Non-2xx responses:      15
Keep-Alive requests:    239
Total transferred:      2558415 bytes
HTML transferred:       2448532 bytes
Requests per second:    25.29 [#/sec] (mean)
Time per request:       790.914 [ms] (mean)
Time per request:       39.546 [ms] (mean, across all concurrent
requests)
Transfer rate:          248.74 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6  16.1      0      56
Processing:   121  748 592.9    631    4312
Waiting:      120  688 589.9    566    4243
Total:        121  755 603.8    631    4365

Percentage of the requests served within a certain time (ms)
  50%    631
  66%    710
  75%    772
  80%    804
  90%   1670
  95%   1893
  98%   2788
  99%   3929
 100%   4365 (longest request)
>>>>



Reply via email to