Re: [uWSGI] Server hangs after harakiri; debugging harakiri events in general

Ernesto Revilla Tue, 03 Oct 2017 02:03:06 -0700

Hi

Did you ever resolve this?


It seems that I have the exact same problem on Ubuntu 14.04 / 64 bits.

Regards

Hi,
>
> Our uWSGI server hangs (stops serving any requests until it's
> restarted) about once a week, generally after a harakiri event.  Can
> anyone help troubleshoot this?  Also how can I debug harakiri events
> in general?  Most of them don't cause the server to hang, but I don't
> understand what's causing them.  The requests printed when the worker
> dies are all normal parts of our app that are accessed hundreds of
> times per day without incident.
>
> uWSGI version is 2.0.8.
> OS is Ubuntu 14.04 LTS.
> CPU is x86_64 - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz on Amazon EC2.
> Webserver is nginx, load balancer is haproxy.
>
> Config is below.
>
> Logs from a harakiri that caused the server to hang:
>
> Thu Nov 20 15:09:29 2014 - *** HARAKIRI ON WORKER 8 (pid: 5046, try: 1) ***
> HARAKIRI: -- syscall> 7 0x7fffafe0e9c0 0x1 0xffffffff 0x8 0x1040bc8
> 0x1 0x7fffafe0e9a0 0x7f3afea6cfbd
> HARAKIRI: -- wchan> poll_schedule_timeout
> Thu Nov 20 15:09:29 2014 - HARAKIRI !!! worker 8 status !!!
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 0] 127.0.0.1 - GET
> /acct_quota since 1416495853
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 1] 127.0.0.1 - POST /pullf/
> since 1416495854
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 2] 127.0.0.1 - GET / since 
> 1416495861
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 3] 127.0.0.1 - GET / since 
> 1416495853
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 4] 127.0.0.1 - POST /signin/
> since 1416495865
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 5] 127.0.0.1 - POST
> /clientresp since 1416495856
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 6] 127.0.0.1 - POST /pullf/
> since 1416495858
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 7] 127.0.0.1 - GET
> /~Dreamshot/495/percentage-of-bachelors-degrees-conferred-to-women-in-the-usa-by-major-1970-2012/
> since 1416495852
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 8] 127.0.0.1 - POST
> /stylethemes/ since 1416495858
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 9] 127.0.0.1 - POST
> /clientresp since 1416495854
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 10] 127.0.0.1 - POST
> /clientresp since 1416495856
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 11] 127.0.0.1 - GET
> /acct_quota since 1416495860
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 12] 127.0.0.1 - POST
> /signin/ since 1416495866
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 13] 127.0.0.1 - POST
> /clientresp since 1416495865
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 14] 127.0.0.1 - POST /pullf/
> since 1416495853
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 15] 127.0.0.1 - GET
> /%7Ehianalytics/189/ since 1416495852
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 16] 127.0.0.1 - POST /pullf/
> since 1416495851
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 17] 127.0.0.1 - POST
> /signin/ since 1416495868
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 18] 127.0.0.1 - POST
> /clientresp since 1416495866
> Thu Nov 20 15:09:29 2014 - HARAKIRI [core 19] 127.0.0.1 - GET
> /getsources?fid=&extrarefs=Doktorigi%3A8 since 1416495868
> Thu Nov 20 15:09:29 2014 - HARAKIRI !!! end of worker 8 status !!!
> DAMN ! worker 8 (pid: 5046) died, killed by signal 9 :( trying respawn ...
> Respawned uWSGI worker 8 (new pid: 10985)
> monitor (pid=10985): Starting stack trace monitor.
> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0xa3dd80
> pid: 10985 (default app)
>
> When the server is able to successfully restart the worker, the
> message looks similar.  Here's our latest:
>
> Fri Nov 21 18:36:12 2014 - *** HARAKIRI ON WORKER 5 (pid: 23549, try: 1) ***
> HARAKIRI: -- wchan> futex_wait_queue_me
> Fri Nov 21 18:36:12 2014 - HARAKIRI !!! worker 5 status !!!
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 0] 127.0.0.1 - GET /plot
> since 1416594367
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 1] 127.0.0.1 - POST
> /getuser/ since 1416594367
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 2] 127.0.0.1 - POST
> /user_account_actions since 1416594370
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 3] 127.0.0.1 - GET /plot
> since 1416594366
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 4] 127.0.0.1 - POST /pullf/
> since 1416594368
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 5] 127.0.0.1 - POST
> /clientresp since 1416594368
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 6] 127.0.0.1 - GET
> /python/3d-plots-tutorial/ since 1416594368
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 7] 127.0.0.1 - POST
> /getuser/ since 1416594370
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 8] 127.0.0.1 - POST
> /getuser/ since 1416594367
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 9] 127.0.0.1 - POST
> /getuser/ since 1416594368
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 10] 127.0.0.1 - POST
> /getuser/ since 1416594368
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 11] 127.0.0.1 - POST
> /svgtopdf/ since 1416594371
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 12] 127.0.0.1 - POST
> /clientresp since 1416594366
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 13] 127.0.0.1 - GET
> /quandl?code=WORLDBANK/UZB_SP_RUR_TOTL_ZS since 1416594368
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 14] 127.0.0.1 - GET
> /~martin.2098/20/-line0-css-penthouse-line0-line0 since 1416594367
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 15] 127.0.0.1 - POST
> /user_account_actions since 1416594368
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 16] 127.0.0.1 - GET /plot
> since 1416594368
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 17] 127.0.0.1 - GET /plot
> since 1416594367
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 18] 127.0.0.1 - POST
> /clientresp since 1416594371
> Fri Nov 21 18:36:12 2014 - HARAKIRI [core 19] 127.0.0.1 - POST
> /getnotifs/ since 1416594367
> Fri Nov 21 18:36:12 2014 - HARAKIRI !!! end of worker 5 status !!!
> DAMN ! worker 5 (pid: 23549) died, killed by signal 9 :( trying respawn ...
> Respawned uWSGI worker 5 (new pid: 24129)
> monitor (pid=24129): Starting stack trace monitor.
> WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0xae8aa0
> pid: 24129 (default app)
>
> Configuration from --show-config:
>
> ;uWSGI instance configuration
> [uwsgi]
> show-config = true
> emperor = /etc/streambed_uwsgi.ini
> ;end of configuration
>
> Contents of /etc/streambed_uwsgi.ini:
>
> [uwsgi]
>
> uid = www-data
> gid = www-data
>
> chdir = /var/www/streambed/shelly
> module = apache.wsgi
> socket = /var/run/streambed.sock
> chown-socket = www-data
> logto = /var/log/uwsgi/streambed
> pidfile = /var/run/streambed.pid
>
> master = true
> # Conventional SIGTERM behaviour - needed for runit:
> die-on-term = true
> # Clean up on exit:
> vacuum = true
>
> # 10 processes, 20 threads each:
> processes = 10
> threads = 20
>
> buffer-size = 32768
>
> # Load the app in each worker process, rather than in the master process:
> lazy = true
> # Maximum time to service a request (seconds):
> harakiri = 300
> harakiri-verbose = true
> # Reload each process after this number of requests:
> max-requests = 10000
> # Save HTTP bodies larger than this to disk (bytes):
> post-buffering = 1000000
>
> # Stats socket
> stats = /var/run/uwsgi/streambed.stats
>
>
> Thanks for any hints or suggestions on either of these issues!
>
> Jody McIntyre
> Plotly Engineering
>
> ....



Ernesto Revilla
Área Técnica
TangramBPM.es
Tlf: 630 244 136

_______________________________________________
uWSGI mailing list
uWSGI@lists.unbit.it
http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi

Re: [uWSGI] Server hangs after harakiri; debugging harakiri events in general

Reply via email to