Hi Did you ever resolve this?
It seems that I have the exact same problem on Ubuntu 14.04 / 64 bits. Regards Hi, > > Our uWSGI server hangs (stops serving any requests until it's > restarted) about once a week, generally after a harakiri event. Can > anyone help troubleshoot this? Also how can I debug harakiri events > in general? Most of them don't cause the server to hang, but I don't > understand what's causing them. The requests printed when the worker > dies are all normal parts of our app that are accessed hundreds of > times per day without incident. > > uWSGI version is 2.0.8. > OS is Ubuntu 14.04 LTS. > CPU is x86_64 - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz on Amazon EC2. > Webserver is nginx, load balancer is haproxy. > > Config is below. > > Logs from a harakiri that caused the server to hang: > > Thu Nov 20 15:09:29 2014 - *** HARAKIRI ON WORKER 8 (pid: 5046, try: 1) *** > HARAKIRI: -- syscall> 7 0x7fffafe0e9c0 0x1 0xffffffff 0x8 0x1040bc8 > 0x1 0x7fffafe0e9a0 0x7f3afea6cfbd > HARAKIRI: -- wchan> poll_schedule_timeout > Thu Nov 20 15:09:29 2014 - HARAKIRI !!! worker 8 status !!! > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 0] 127.0.0.1 - GET > /acct_quota since 1416495853 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 1] 127.0.0.1 - POST /pullf/ > since 1416495854 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 2] 127.0.0.1 - GET / since > 1416495861 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 3] 127.0.0.1 - GET / since > 1416495853 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 4] 127.0.0.1 - POST /signin/ > since 1416495865 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 5] 127.0.0.1 - POST > /clientresp since 1416495856 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 6] 127.0.0.1 - POST /pullf/ > since 1416495858 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 7] 127.0.0.1 - GET > /~Dreamshot/495/percentage-of-bachelors-degrees-conferred-to-women-in-the-usa-by-major-1970-2012/ > since 1416495852 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 8] 127.0.0.1 - POST > /stylethemes/ since 1416495858 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 9] 127.0.0.1 - POST > /clientresp since 1416495854 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 10] 127.0.0.1 - POST > /clientresp since 1416495856 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 11] 127.0.0.1 - GET > /acct_quota since 1416495860 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 12] 127.0.0.1 - POST > /signin/ since 1416495866 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 13] 127.0.0.1 - POST > /clientresp since 1416495865 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 14] 127.0.0.1 - POST /pullf/ > since 1416495853 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 15] 127.0.0.1 - GET > /%7Ehianalytics/189/ since 1416495852 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 16] 127.0.0.1 - POST /pullf/ > since 1416495851 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 17] 127.0.0.1 - POST > /signin/ since 1416495868 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 18] 127.0.0.1 - POST > /clientresp since 1416495866 > Thu Nov 20 15:09:29 2014 - HARAKIRI [core 19] 127.0.0.1 - GET > /getsources?fid=&extrarefs=Doktorigi%3A8 since 1416495868 > Thu Nov 20 15:09:29 2014 - HARAKIRI !!! end of worker 8 status !!! > DAMN ! worker 8 (pid: 5046) died, killed by signal 9 :( trying respawn ... > Respawned uWSGI worker 8 (new pid: 10985) > monitor (pid=10985): Starting stack trace monitor. > WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0xa3dd80 > pid: 10985 (default app) > > When the server is able to successfully restart the worker, the > message looks similar. Here's our latest: > > Fri Nov 21 18:36:12 2014 - *** HARAKIRI ON WORKER 5 (pid: 23549, try: 1) *** > HARAKIRI: -- wchan> futex_wait_queue_me > Fri Nov 21 18:36:12 2014 - HARAKIRI !!! worker 5 status !!! > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 0] 127.0.0.1 - GET /plot > since 1416594367 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 1] 127.0.0.1 - POST > /getuser/ since 1416594367 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 2] 127.0.0.1 - POST > /user_account_actions since 1416594370 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 3] 127.0.0.1 - GET /plot > since 1416594366 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 4] 127.0.0.1 - POST /pullf/ > since 1416594368 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 5] 127.0.0.1 - POST > /clientresp since 1416594368 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 6] 127.0.0.1 - GET > /python/3d-plots-tutorial/ since 1416594368 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 7] 127.0.0.1 - POST > /getuser/ since 1416594370 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 8] 127.0.0.1 - POST > /getuser/ since 1416594367 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 9] 127.0.0.1 - POST > /getuser/ since 1416594368 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 10] 127.0.0.1 - POST > /getuser/ since 1416594368 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 11] 127.0.0.1 - POST > /svgtopdf/ since 1416594371 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 12] 127.0.0.1 - POST > /clientresp since 1416594366 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 13] 127.0.0.1 - GET > /quandl?code=WORLDBANK/UZB_SP_RUR_TOTL_ZS since 1416594368 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 14] 127.0.0.1 - GET > /~martin.2098/20/-line0-css-penthouse-line0-line0 since 1416594367 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 15] 127.0.0.1 - POST > /user_account_actions since 1416594368 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 16] 127.0.0.1 - GET /plot > since 1416594368 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 17] 127.0.0.1 - GET /plot > since 1416594367 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 18] 127.0.0.1 - POST > /clientresp since 1416594371 > Fri Nov 21 18:36:12 2014 - HARAKIRI [core 19] 127.0.0.1 - POST > /getnotifs/ since 1416594367 > Fri Nov 21 18:36:12 2014 - HARAKIRI !!! end of worker 5 status !!! > DAMN ! worker 5 (pid: 23549) died, killed by signal 9 :( trying respawn ... > Respawned uWSGI worker 5 (new pid: 24129) > monitor (pid=24129): Starting stack trace monitor. > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0xae8aa0 > pid: 24129 (default app) > > Configuration from --show-config: > > ;uWSGI instance configuration > [uwsgi] > show-config = true > emperor = /etc/streambed_uwsgi.ini > ;end of configuration > > Contents of /etc/streambed_uwsgi.ini: > > [uwsgi] > > uid = www-data > gid = www-data > > chdir = /var/www/streambed/shelly > module = apache.wsgi > socket = /var/run/streambed.sock > chown-socket = www-data > logto = /var/log/uwsgi/streambed > pidfile = /var/run/streambed.pid > > master = true > # Conventional SIGTERM behaviour - needed for runit: > die-on-term = true > # Clean up on exit: > vacuum = true > > # 10 processes, 20 threads each: > processes = 10 > threads = 20 > > buffer-size = 32768 > > # Load the app in each worker process, rather than in the master process: > lazy = true > # Maximum time to service a request (seconds): > harakiri = 300 > harakiri-verbose = true > # Reload each process after this number of requests: > max-requests = 10000 > # Save HTTP bodies larger than this to disk (bytes): > post-buffering = 1000000 > > # Stats socket > stats = /var/run/uwsgi/streambed.stats > > > Thanks for any hints or suggestions on either of these issues! > > Jody McIntyre > Plotly Engineering > > .... Ernesto Revilla Área Técnica TangramBPM.es Tlf: 630 244 136
_______________________________________________ uWSGI mailing list uWSGI@lists.unbit.it http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi