Hi, (I'm back to squid-gzip task now). I come up to this situation:
squid 3.0-PRE3-20031008 with epoll kernel 2.4.21 patched with http://www.xmailserver.org/linux-patches/epoll-lt-2.4.21-0.18.diff When a client requests a very long object (such as a video), squid uses 100% of CPU. It was caused because epoll_wait returned immediately reporting that connection to client can be written without blocking. So squid was continously calling to epoll_wait, which in turn returned immediately. Here's my attempt to solve this (first attach). Also, I provide attachments generated with: debug_options ALL,1 5,9 If it isn't a reasonable/logical way to solve this problem, please let me know (and explain to me how it should be fixed, so I can rewrite the fix). Hope it helps & awaiting for feedback, -- Gonzalo Arana <[EMAIL PROTECTED]> UOL-Sinectis S.A.
Fixes (correctly?) a 100% CPU usage on large/streaming content delivered when using epoll. Kernel 2.4.21 patched with http://www.xmailserver.org/linux-patches/epoll-lt-2.4.21-0.18.diff diff -bur squid-3.0/src/comm_epoll.cc squid-3.0-fixed/src/comm_epoll.cc --- squid-3.0/src/comm_epoll.cc Sun Aug 3 18:38:15 2003 +++ squid-3.0-fixed/src/comm_epoll.cc Fri Oct 10 16:37:35 2003 @@ -239,7 +239,7 @@ getCurrentTime(); if (num == 0) - return COMM_OK; /* No error.. */ + return COMM_TIMEOUT; /* No error.. */ for (i = 0, cevents = pevents; i < num; i++, cevents++) { fd = cevents->data.fd; @@ -247,11 +247,14 @@ debug(5, DEBUG_EPOLL ? 0 : 8) ("comm_select(): got fd=%d events=%d F->read_handler=%p F->write_handler=%p\n", fd,cevents->events,F->read_handler,F->write_handler); + //TODO: add EPOLLPRI?? if(cevents->events & (EPOLLIN|EPOLLHUP|EPOLLERR)) { if((hdl = F->read_handler) != NULL) { debug(5, DEBUG_EPOLL ? 0 : 8) ("comm_select(): Calling read handler on fd=%d\n",fd); F->read_handler = NULL; hdl(fd, F->read_data); + } else { // if POLLIN but no handler, remove interest + commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0); } } @@ -260,6 +263,8 @@ debug(5, DEBUG_EPOLL ? 0 : 8) ("comm_select(): Calling write handler on fd=%d\n",fd); F->write_handler = NULL; hdl(fd, F->write_data); + } else { + commSetSelect(fd, COMM_SELECT_WRITE, NULL, NULL, 0); } } }
Situation: fd type 9 to client 14 to web server **** read more data from web server 2003/10/10 16:08:55.588| comm_poll: 1+0 FDs ready 2003/10/10 16:08:55.588| comm_poll: FD 14 ready for reading 2003/10/10 16:08:55.588| comm_read_try: fd 14, size 87379, retval 1620, errno 0 2003/10/10 16:08:55.588| comm_calliocallback: 0x84f7b3c 2003/10/10 16:08:55.588| comm_iocallbackpending: 0x84f7b3c 2003/10/10 16:08:55.588| comm_calliocallback: 0x84f7b3c 2003/10/10 16:08:55.588| commSetTimeout: FD 14 timeout 900 **** want to write to FD 9 2003/10/10 16:08:55.589| commSetSelect: FD 9 type 2 2003/10/10 16:08:55.589| comm_read, queueing read for FD 14 **** and read from fd 14 2003/10/10 16:08:55.589| commSetSelect: FD 14 type 1 2003/10/10 16:08:55.589| comm_iocallbackpending: (nil) 2003/10/10 16:08:55.589| comm_poll: 1+0 FDs ready 2003/10/10 16:08:55.589| comm_poll: FD 9 ready for writing 2003/10/10 16:08:55.589| comm_write_try: fd 9: tried to write 1620 bytes, retval 1620, errno 0 **** ok, data has been written to client 2003/10/10 16:08:55.589| comm_calliocallback: 0x852e9d0 2003/10/10 16:08:55.589| comm_iocallbackpending: 0x852e9d0 2003/10/10 16:08:55.589| comm_calliocallback: 0x852e9d0 2003/10/10 16:08:55.589| comm_iocallbackpending: (nil) 2003/10/10 16:08:55.658| comm_poll: 1+0 FDs ready **** squid can read more data from web server. 2003/10/10 16:08:55.658| comm_poll: FD 14 ready for reading 2003/10/10 16:08:55.658| comm_read_try: fd 14, size 87379, retval 665, errno 0 2003/10/10 16:08:55.658| comm_calliocallback: 0x84f7b3c 2003/10/10 16:08:55.658| comm_iocallbackpending: 0x84f7b3c 2003/10/10 16:08:55.658| comm_calliocallback: 0x84f7b3c 2003/10/10 16:08:55.658| commSetTimeout: FD 14 timeout 900 **** squid want to to write to client (since it has read more data), and read from server 2003/10/10 16:08:55.659| commSetSelect: FD 9 type 2 2003/10/10 16:08:55.659| comm_read, queueing read for FD 14 2003/10/10 16:08:55.659| commSetSelect: FD 14 type 1 2003/10/10 16:08:55.659| comm_iocallbackpending: (nil) 2003/10/10 16:08:55.659| comm_poll: 1+0 FDs ready **** writing to client. 2003/10/10 16:08:55.659| comm_poll: FD 9 ready for writing 2003/10/10 16:08:55.659| comm_write_try: fd 9: tried to write 665 bytes, retval 665, errno 0 2003/10/10 16:08:55.659| comm_calliocallback: 0x852e9d0 2003/10/10 16:08:55.659| comm_iocallbackpending: 0x852e9d0 2003/10/10 16:08:55.659| comm_calliocallback: 0x852e9d0 2003/10/10 16:08:55.659| comm_iocallbackpending: (nil) **** no fd is ready for I/O, so COMM_TIMEOUT is returned from comm_poll 2003/10/10 16:08:55.751| comm_poll: 0+0 FDs ready 2003/10/10 16:08:55.751| comm_poll: time out: 1065812935. 2003/10/10 16:08:55.751| comm_calliocallback: (nil) 2003/10/10 16:08:55.751| comm_iocallbackpending: (nil) 2003/10/10 16:08:55.751| comm_iocallbackpending: (nil) **** and no pending callbacks. 2003/10/10 16:08:55.829| comm_poll: 1+0 FDs ready 2003/10/10 16:08:55.829| comm_poll: FD 14 ready for reading 2003/10/10 16:08:55.830| comm_read_try: fd 14, size 87379, retval 810, errno 0 .... and so on.
Situation: ========= fd type 7 DNS socket 8 http: LISTEN 14 connection to client. 16 connection to web server. **** new connection arrives 2003/10/10 14:18:05.847| comm_select(): got fd=8 events=1 F->read_handler=0x807b044 F->write_handler=(nil) 2003/10/10 14:18:05.847| comm_select(): Calling read handler on fd=8 2003/10/10 14:18:05.847| commSetSelect(fd=8,type=1,handler=0x807b044,client_data=(nil),timeout=0) **** squid wants to read from client connection 2003/10/10 14:18:05.848| comm_add_close_handler: FD 14, handler=0x8068e38, data=0x84d7d50 2003/10/10 14:18:05.848| commSetTimeout: FD 14 timeout 300 2003/10/10 14:18:05.848| comm_read, queueing read for FD 14 **** epoll_ctl fails(EPOLL_CTL_MOD) fails because fd=14 was never added 2003/10/10 14:18:05.848| commSetSelect(fd=14,type=1,handler=0x8077418,client_data=(nil),timeout=0) 2003/10/10 14:18:05.848| commSetSelect: epoll_ctl(,EPOLL_CTL_MOD,,) failed on fd=14: entry does not exist **** but epoll_ctl(EPOLL_CTL_ADD) does not fail. 2003/10/10 14:18:05.848| commSetSelect(fd=8,type=1,handler=0x807b044,client_data=(nil),timeout=0) 2003/10/10 14:18:05.848| comm_calliocallback: (nil) 2003/10/10 14:18:05.848| comm_iocallbackpending: (nil) **** epoll_wait informs squid that client has sent data to squid 2003/10/10 14:18:05.848| comm_select(): got fd=14 events=1 F->read_handler=0x8077418 F->write_handler=(nil) 2003/10/10 14:18:05.848| comm_select(): Calling read handler on fd=14 2003/10/10 14:18:05.848| comm_read_try: fd 14, size 4096, retval 452, errno 0 2003/10/10 14:18:05.848| comm_calliocallback: 0x84dfa60 2003/10/10 14:18:05.848| comm_iocallbackpending: 0x84dfa60 2003/10/10 14:18:05.848| comm_calliocallback: 0x84dfa60 2003/10/10 14:18:05.848| commSetTimeout: FD 14 timeout 86400 2003/10/10 14:18:05.849| comm_open: FD 16 is a new socket 2003/10/10 14:18:05.849| comm_add_close_handler: FD 16, handler=0x8085538, data=0x856dd54 2003/10/10 14:18:05.849| commSetTimeout: FD 16 timeout 120 2003/10/10 14:18:05.849| commConnectStart: FD 16, data 0x856dd54, mediaserver.uolsinectis.com.ar:80 2003/10/10 14:18:05.849| comm_add_close_handler: FD 16, handler=0x8078acc, data=0x85054a8 **** DNS stuff 2003/10/10 14:18:05.849| commSetSelect(fd=7,type=1,handler=0x807f52c,client_data=(nil),timeout=0) 2003/10/10 14:18:05.849| commSetSelect: epoll_ctl(,EPOLL_CTL_MOD,,) failed on fd=7: entry does not exist **** more data is willing to come from client. 2003/10/10 14:18:05.849| comm_read, queueing read for FD 14 2003/10/10 14:18:05.849| commSetSelect(fd=14,type=1,handler=0x8077418,client_data=(nil),timeout=0) **** here comes DNS answer. 2003/10/10 14:18:05.851| comm_select(): got fd=7 events=1 F->read_handler=0x807f52c F->write_handler=(nil) 2003/10/10 14:18:05.851| comm_select(): Calling read handler on fd=7 **** ok, still connecting to web server. 2003/10/10 14:18:05.851| connect FD 16: (115) Operation now in progress 2003/10/10 14:18:05.851| commConnectHandle: FD 16: COMM_INPROGRESS **** squid has data to write when connection to web server is established. 2003/10/10 14:18:05.851| commSetSelect(fd=16,type=2,handler=0x8078df4,client_data=0x85054a8,timeout=0) **** fd=16 has never been epoll_ctl'ed 2003/10/10 14:18:05.851| commSetSelect: epoll_ctl(,EPOLL_CTL_MOD,,) failed on fd=16: entry does not exist **** fd=16 epoll_ctl(EPOLL_CTL_ADD): OK 2003/10/10 14:18:05.851| comm_calliocallback: (nil) 2003/10/10 14:18:05.851| comm_iocallbackpending: (nil) **** connection to server has been established, so request to server can be written. 2003/10/10 14:18:05.852| comm_select(): got fd=16 events=4 F->read_handler=(nil) F->write_handler=0x8078df4 2003/10/10 14:18:05.852| comm_select(): Calling write handler on fd=16 2003/10/10 14:18:05.852| commConnectCallback: fd 16, data 0x856dd54 2003/10/10 14:18:05.852| comm_remove_close_handler: FD 16, handler=0x8078acc, data=0x85054a8 2003/10/10 14:18:05.852| commSetTimeout: FD 16 timeout -1 2003/10/10 14:18:05.852| comm_add_close_handler: FD 16, handler=0x8091440, data=0x407b15b8 2003/10/10 14:18:05.853| comm_write: FD 16: sz 568: hndl 0x80929c8: data 0x407b15b8. **** more data has to be written to web server. 2003/10/10 14:18:05.853| commSetSelect(fd=16,type=2,handler=0x807a7b0,client_data=0x85125b0,timeout=0) 2003/10/10 14:18:05.853| commConnectFree: FD 16 2003/10/10 14:18:05.853| comm_calliocallback: (nil) 2003/10/10 14:18:05.853| comm_iocallbackpending: (nil) **** more data can be written to web server. 2003/10/10 14:18:05.853| comm_select(): got fd=16 events=4 F->read_handler=(nil) F->write_handler=0x807a7b0 2003/10/10 14:18:05.853| comm_select(): Calling write handler on fd=16 2003/10/10 14:18:05.853| commHandleWrite: FD 16: off 0, sz 568. 2003/10/10 14:18:05.853| commHandleWrite: write() returns 568 **** request has been written to server, now squid wants the answer. 2003/10/10 14:18:05.853| comm_read, queueing read for FD 16 2003/10/10 14:18:05.853| commSetSelect(fd=16,type=1,handler=0x8077418,client_data=(nil),timeout=0) 2003/10/10 14:18:05.853| commSetTimeout: FD 16 timeout 900 2003/10/10 14:18:05.853| comm_calliocallback: (nil) 2003/10/10 14:18:05.853| comm_iocallbackpending: (nil) **** data can be read from web server. 2003/10/10 14:18:06.218| comm_select(): got fd=16 events=1 F->read_handler=0x8077418 F->write_handler=(nil) 2003/10/10 14:18:06.219| comm_select(): Calling read handler on fd=16 2003/10/10 14:18:06.219| comm_read_try: fd 16, size 87379, retval 194, errno 0 2003/10/10 14:18:06.219| comm_calliocallback: 0x84dfa60 2003/10/10 14:18:06.219| comm_iocallbackpending: 0x84dfa60 2003/10/10 14:18:06.219| comm_calliocallback: 0x84dfa60 2003/10/10 14:18:06.219| commSetTimeout: FD 16 timeout 900 2003/10/10 14:18:06.219| comm_read, queueing read for FD 16 2003/10/10 14:18:06.219| commSetSelect(fd=16,type=1,handler=0x8077418,client_data=(nil),timeout=0) **** more data can be read from web server. 2003/10/10 14:18:06.220| comm_select(): got fd=16 events=1 F->read_handler=0x8077418 F->write_handler=(nil) 2003/10/10 14:18:06.220| comm_select(): Calling read handler on fd=16 2003/10/10 14:18:06.220| comm_read_try: fd 16, size 87379, retval 812, errno 0 2003/10/10 14:18:06.220| comm_calliocallback: 0x84dfa60 2003/10/10 14:18:06.220| comm_iocallbackpending: 0x84dfa60 2003/10/10 14:18:06.220| comm_calliocallback: 0x84dfa60 2003/10/10 14:18:06.220| commSetTimeout: FD 16 timeout 900 **** writing reply back to client. 2003/10/10 14:18:06.220| comm_write: FD 14: sz 1145: hndl 0x806b294: data 0x406cc0f0. 2003/10/10 14:18:06.220| commSetSelect(fd=14,type=2,handler=0x807a7b0,client_data=0x85125b0,timeout=0) 2003/10/10 14:18:06.220| comm_read, queueing read for FD 16 2003/10/10 14:18:06.220| commSetSelect(fd=16,type=1,handler=0x8077418,client_data=(nil),timeout=0) **** data can be read to client 2003/10/10 14:18:06.220| comm_select(): got fd=14 events=4 F->read_handler=0x8077418 F->write_handler=0x807a7b0 2003/10/10 14:18:06.221| comm_select(): Calling write handler on fd=14 2003/10/10 14:18:06.221| commHandleWrite: FD 14: off 0, sz 1145. 2003/10/10 14:18:06.221| commHandleWrite: write() returns 1145 **** **** **** INTEREST ON writing to fd=14 (client): shoulnd't be removed? **** Why epoll_wait(kpfd, pevents, MAX_SQUIFD, msec) returns fd=14 can be written continously? **** **** **** 2003/10/10 14:18:06.221| comm_calliocallback: (nil) 2003/10/10 14:18:06.221| comm_iocallbackpending: (nil) **** and here comes the endless loop. 2003/10/10 14:18:06.221| comm_select(): got fd=14 events=4 F->read_handler=0x8077418 F->write_handler=(nil) 2003/10/10 14:18:06.221| comm_calliocallback: (nil) 2003/10/10 14:18:06.221| comm_iocallbackpending: (nil) 2003/10/10 14:18:06.221| comm_select(): got fd=14 events=4 F->read_handler=0x8077418 F->write_handler=(nil) 2003/10/10 14:18:06.221| comm_calliocallback: (nil) 2003/10/10 14:18:06.221| comm_iocallbackpending: (nil) 2003/10/10 14:18:06.221| comm_select(): got fd=14 events=4 F->read_handler=0x8077418 F->write_handler=(nil) 2003/10/10 14:18:06.221| comm_calliocallback: (nil) 2003/10/10 14:18:06.221| comm_iocallbackpending: (nil) 2003/10/10 14:18:06.221| comm_select(): got fd=14 events=4 F->read_handler=0x8077418 F->write_handler=(nil) 2003/10/10 14:18:06.221| comm_calliocallback: (nil) 2003/10/10 14:18:06.221| comm_iocallbackpending: (nil) 2003/10/10 14:18:06.221| comm_select(): got fd=14 events=4 F->read_handler=0x8077418 F->write_handler=(nil) 2003/10/10 14:18:06.221| comm_calliocallback: (nil) 2003/10/10 14:18:06.221| comm_iocallbackpending: (nil) 2003/10/10 14:18:06.221| comm_select(): got fd=14 events=4 F->read_handler=0x8077418 F->write_handler=(nil)