On Mon, Sep 14, 2009 at 6:14 PM, Alex Rousskov <rouss...@measurement-factory.com> wrote: > On Mon, 14 Sep 2009, Amos Jeffries wrote: > >> I think we should take this on-list so the others with more detailed >> knowledge can give advice in case I have my facts wrong about AsyncCalls... > > I am afraid this discussion focuses on a small part of a much bigger problem > so finalizing the design decisions here may be counter productive until we > have an agreement on how to split Squid into threads in general. > > There are a few different ways to partition Squid and most of them have been > discussed in the past. I am not sure whether the discussions have ever > reached a consensus point. I am also not sure there is consensus whether we > should design for 2 cores, 8, cores, 16 cores, 32 cores, or more, or all of > the above? >
Thats correct, but it is manageable... Depending upon the hardware we can change behavior of the squid. I know it is not easy but not impossible. thanks for pointing out this issue........ > There is also a question on how configurable everything should be. For > example, if the box has only two cores, will the user be able to specify > which threads are created and which created threads run on which core? Also, > will the user be able to specify whether memory or disk cache is shared? > > I also agree that the major difficulty here is not implementing the > threading code itself, but making sure that no shared data is accessed > unlocked. This is easy when you start from scratch, but we have a lot of > globals and static objects accessed all around the code. Protecting each of > them individually by hand would require a lot of coding and risk. IMO, we > need to add a few classes that would make sharing global data simple and > safe. This is where C++ would help a lot. > Yes we have to think about this. Risk is there....... New structures could be added...... But I dont want to comment over this issue, it would only be possible after in-dept analysis............ > And even the protection of globals requires a high-level design plan: do we > protect global objects like Config or individual data types like > SquidString? > > Finally, I agree that making accepting code into a thread may lead to "too > aggressive" incoming stream of requests that would overwhelm the other parts > of the system. I have recently observed that kind of behavior in another > threaded performance-focused application. This does not mean that accept > must not be a thread, but it means that we should think how the overall > design would prevent one thread from overloading others with work. > > Cheers, > > Alex. > > >> Sachin Malave wrote: >>> >>> On Sun, Sep 13, 2009 at 7:52 PM, Amos Jeffries <squ...@treenet.co.nz> >>> wrote: >>>> >>>> On Sun, 13 Sep 2009 07:12:56 -0400, Sachin Malave >>>> <sachinmal...@gmail.com> >>>> wrote: >>>>> >>>>> Hello Amos, >>>>> >>>>> >>>>> As discussed before, I am analyzing codes for the changes as on >>>>> http://wiki.squid-cache.org/Features/SmpScale, and have concentrated >>>>> on epoll ( select ) implementations in squid. It is found that epoll >>>>> is polling all the descriptors & processing them one by one. There is >>>>> an important FD used by http port which is always busy, but has to >>>>> wait for other descriptors in queue to be processed. >>>>> >>>>> Then, I also found that it is possible to separateworking of all fd >>>>> handlers , e.g fd used by http port.(tried) >>>>> This can be done by making some changes in codes................. >>>>> i have been trying to code & test these changes since last few days, >>>>> of course this may not be correct or need some improvements to meet >>>>> our requirements, Please give me feedback and tell me dependencies i >>>>> might have not considered, >>>>> >>>>> Again one important issue, I know that, doing changes as mentioned >>>>> below will create and kill thread after each timeout but we can extend >>>>> it further, and make a separate thread that will never exit, we will >>>>> discuss on this issue later, before everything, please check proposed >>>>> changes so that we can move further. >>>> >>>> You mean the main http_port listener (port 3128 etc)? >>>> This is currently not handled specially, due to there being more than >>>> one >>>> listener FD in many Squid setups (multiple http_port and https_port then >>>> other protocols like HTTPS, ICP, HTCP, DNS), any threading solution >>>> needs >>>> to handle the listeners agnostic of what they do. Though splitting >>>> listener >>>> FD accepts into a separate loop from other FD does seem sound. >>>> >>>> Special pseudo-thread handling is already hacked up in a pseudo-thread >>>> poller for DNS replies. Which is complicating the FD handling there. >>>> What >>>> I'd like to see is resource-locking added to the Async queue when adding >>>> new queue entries. >>>> >>>> That allows making the whole select loop(s) happen in parallel to the >>>> rest >>>> of Squid. Simply accepts and spawns AsyncJob/AsyncCall entries into the >>>> main squid processing queue. >>>> >>>> Workable? >>>> >>>> >>>>> *Changes are tagged with "NEW" >>>>> >>>>> 1.> inside client_side.cc >>>>> >>>>> void clientHttpConnectionsOpen(void) >>>>> { >>>>> . >>>>> httpfd=fd; //httfd now holding http file descriptor (NEW) >>>>> . >>>>> . >>>>> . >>>>> comm_accept(fd, httpAccept, s); >>>>> >>>>> } >>>>> >>>>> >>>>> 2.> inside comm_epoll.cc >>>>> >>>>> int kdpfdHttp; >>>>> int useHttpThread = 1; >>>>> >>>>> void comm_select_init(void) >>>>> { >>>>> >>>>> peventsHttp = (struct epoll_event *) xmalloc(1 * >>>>> sizeof(struct epoll_event)); //NEW >>>>> >>>>> kdpfdHttp = epoll_create(1); //NEW >>>>> >>>>> >>>>> } >>>>> >>>>> void commSetSelect(int fd, unsigned int type, PF * handler, void >>>>> *client_data, time_t timeout) >>>>> { >>>>> >>>>> >>>>> >>>>> if (!F->flags.open) { >>>>> if (useHttpThread) //NEW >>>>> epoll_ctl(kdpfdHttp, >>>>> EPOLL_CTL_DEL, fd, &ev); //NEW >>>>> else >>>>> epoll_ctl(kdpfd, EPOLL_CTL_DEL, fd, >>>>> &ev); >>>>> return; >>>>> } >>>>> >>>>> >>>>> >>>>> if (fd==getHttpfd()) { //NEW >>>>> printf("********Setting epoll_ctl for >>>>> httpfd=%d\n",getHttpfd()); >>>>> if (epoll_ctl(kdpfdHttp, epoll_ctl_type, fd, &ev) >>>>> < 0) { >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, >>>>> "commSetSelect: epoll_ctl(," << >>>>> epolltype_atoi(epoll_ctl_type) << ",,): failed on FD " << fd << ": " >>>>> << xstrerror()); >>>>> } >>>>> } >>>>> >>>>> >>>>> comm_err_t >>>>> comm_select(int msec) >>>>> { >>>>> int num, i,fd=-1; >>>>> fde *F; >>>>> PF *hdl; >>>>> >>>>> //SHM: num2 >>>>> int num2=-1; //NEW >>>>> //SHM: End >>>>> >>>>> struct epoll_event *cevents; >>>>> struct epoll_event *ceventsHttp; >>>>> printf("Inside comm_select:comm_epoll.cc\n"); >>>>> //PROF_start(comm_check_incoming); >>>>> >>>>> if (msec > max_poll_time) >>>>> msec = max_poll_time; >>>>> >>>>> >>>>> for (;;) { >>>>> printf("(for(;;):Inside comm_select:comm_epoll.cc\n"); >>>>> num = epoll_wait(kdpfd, pevents, 1, msec); >>>>> >>>>> //SHM: epoll_wait for kpdfdHttp >>>>> if (useHttpThread) { //NEW >>>>> printf("(for(;;):USEHTTP:Inside >>>>> comm_select:comm_epoll.cc\n"); >>>>> num2 = epoll_wait(kdpfdHttp, peventsHttp, 1, >>>>> msec); //NEW >>>>> printf("\n\n\n num2=%d\n\n\n", num2); >>>>> } >>>>> //SHM: End >>>>> >>>>> ++statCounter.select_loops; >>>>> >>>>> if (num >= 0 || num2 >= 0) //NEW >>>>> break; >>>>> >>>>> if (ignoreErrno(errno)) >>>>> break; >>>>> >>>>> getCurrentTime(); >>>>> >>>>> //PROF_stop(comm_check_incoming); >>>>> >>>>> return COMM_ERROR; >>>>> } >>>>> >>>>> // PROF_stop(comm_check_incoming); //PLEASE DISCUSS >>>> >>>> THIS........... >>>> >>>> The PROF_* bits are rarely used. Removing them from here is acceptable >>>> as a >>>> temporary measure when its initially threaded. Long-term they need >>>> looking >>>> at making thread-safe. IMO the entire time spent in one of the poll >>>> threads >>>> counts as comm_check_incoming so maybe a new API to PROF_* needs to be >>>> developed to account for thread times. This is another SMP job though, >>>> slightly out of scope of your comm loop upgrade. >>>> >>>> >>>>> getCurrentTime(); >>>>> >>>>> statHistCount(&statCounter.select_fds_hist, num); >>>>> >>>>> if (num == 0 && num2 == 0) //NEW (taken lots of my time to fix >>>>> this) >>>>> return COMM_TIMEOUT; /* No error.. */ >>>>> >>>>> // PROF_start(comm_handle_ready_fd); >>>>> >>>>> printf("\nChoosing thread..............................\n"); >>>>> //NEW >>>>> >>>>> >>>>> >>>>> >>>>> //HERE THE HTTP FD IS HANDLED SEPARATELY This could be our new >>>>> thread //NEW START >>>>> >>>>> if (num2 > 0) { >>>>> printf("\nINSIDE Thread now\n\n"); >>>>> ceventsHttp = peventsHttp; >>>>> fd = ceventsHttp->data.fd; >>>>> F = &fd_table[fd]; >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): got FD >>>>> " << fd << " events=" << >>>>> std::hex << ceventsHttp->events << " monitoring=" << >>>>> F->epoll_state << >>>>> " F->read_handler=" << F->read_handler << " >>>>> F->write_handler=" << F->write_handler); >>>>> >>>>> >>>>> >>>>> if (ceventsHttp->events & (EPOLLIN|EPOLLHUP|EPOLLERR) || >>>>> F->flags.read_pending) { >>>>> if ((hdl = F->read_handler) != NULL) { >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): Calling >>>>> read handler on FD " << fd); >>>>> PROF_start(comm_write_handler); >>>>> F->flags.read_pending = 0; >>>>> F->read_handler = NULL; >>>>> hdl(fd, F->read_data); >>>>> PROF_stop(comm_write_handler); >>>>> statCounter.select_fds++; >>>>> } else { >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no read >>>>> handler for FD " << fd); >>>>> // remove interest since no handler exist for this >>>>> event. >>>>> commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0); >>>>> } >>>>> } >>>>> >>>>> if (ceventsHttp->events & (EPOLLOUT|EPOLLHUP|EPOLLERR)) { >>>>> if ((hdl = F->write_handler) != NULL) { >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): Calling >>>>> write handler on FD " << fd); >>>>> PROF_start(comm_read_handler); >>>>> F->write_handler = NULL; >>>>> hdl(fd, F->write_data); >>>>> PROF_stop(comm_read_handler); >>>>> statCounter.select_fds++; >>>>> } else { >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no >>>>> write handler for FD " << fd); >>>>> // remove interest since no handler exist for this >>>>> event. >>>>> commSetSelect(fd, COMM_SELECT_WRITE, NULL, NULL, 0); >>>>> } >>>>> } >>>>> } >>>>> >>>>> //NEW ENDS HERE >>>>> >>>>> >>>>> for (i = 0, cevents = pevents; i < num; i++, cevents++) { >>>>> >>>>> >>>>> fd = cevents->data.fd; >>>>> if ( fd == getHttpfd() ) >>>>> continue; >>>>> F = &fd_table[fd]; >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): got FD " << fd >>>>> << " events=" << >>>>> std::hex << cevents->events << " monitoring=" << >>>>> F->epoll_state << >>>>> " F->read_handler=" << F->read_handler << " >>>>> F->write_handler=" << F->write_handler); >>>>> >>>>> // TODO: add EPOLLPRI?? >>>>> >>>>> if (cevents->events & (EPOLLIN|EPOLLHUP|EPOLLERR) || >>>>> F->flags.read_pending) { >>>>> if ((hdl = F->read_handler) != NULL) { >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): Calling >>>>> read handler on FD " << fd); >>>>> PROF_start(comm_write_handler); >>>>> F->flags.read_pending = 0; >>>>> F->read_handler = NULL; >>>>> hdl(fd, F->read_data); >>>>> PROF_stop(comm_write_handler); >>>>> statCounter.select_fds++; >>>>> } else { >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no read >>>>> handler for FD " << fd); >>>>> // remove interest since no handler exist for this >>>>> event. >>>>> commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0); >>>>> } >>>>> } >>>>> >>>>> if (cevents->events & (EPOLLOUT|EPOLLHUP|EPOLLERR)) { >>>>> if ((hdl = F->write_handler) != NULL) { >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): Calling >>>>> write handler on FD " << fd); >>>>> PROF_start(comm_read_handler); >>>>> F->write_handler = NULL; >>>>> hdl(fd, F->write_data); >>>>> PROF_stop(comm_read_handler); >>>>> statCounter.select_fds++; >>>>> } else { >>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no >>>>> write handler for FD " << fd); >>>>> // remove interest since no handler exist for this >>>>> event. >>>>> commSetSelect(fd, COMM_SELECT_WRITE, NULL, NULL, 0); >>>>> } >>>>> } >>>>> } >>>>> >>>>> // PROF_stop(comm_handle_ready_fd); >>>>> >>>>> return COMM_OK; >>>>> } >>>>> >>>>> } >>>>> >>>>> >>>> Amos >>> >>> >>> >>> >>> -------------------------------------------- >>> -------------------------------------------- >>> Yes u r right, when there are multiple ports then the things will be >>> different but, they can be embedded easily and all descriptors can be >>> handled in the same way, I guess . This change will not be effective >>> much, as codes are not optimized for all descriptors and is not >>> generalized solution for them. >>> >>> I found that there is delay between two select loops which can be >>> minimized to delay of its own descriptor, so concentrated upon.... >>> >> >> I'm not sure exactly what you are saying there. Two select loops being two >> runs of the FD checking loop? or delay when two polling loops are run? >> >> I read your earlier explanation to mean there was a delay doing accept() >> on FD 1 (for example) when FD 1,2,3,4,5,6,7 had reads pending. >> >> And your solution to be: >> thread #1 - looping on read() FD 2,3,4,5,6,7 + other stuff >> thread #2 - looping on accept() FD 1 >> >>> And trying to find the locking mechanism for async queues also , seems >>> it taking time. >> >> I mean some mutex/lock on AsyncCallQueue so the multiple threads can do >> AsyncCallQueue::schedule(call) without pointer collisions with theTail and >> theHead, when they setup the first read of an accepted()'d FD. >> >> For example something like this... >> >> thread #1: >> while ( <events to dial> ) { >> queuedEventX.dial() >> } >> >> thread #2: >> while( <listening for new connections> ) { >> newFD = accept() >> readFromNewFd = new AsyncCallPointer... >> AsyncCallQueue::schedule(readFromNewFd); >> } >> >> >>> >>> Need some time for further analysis........... >>> >>> Thank you so much...... >>> >> >> Amos >> -- >> Please be using >> Current Stable Squid 2.7.STABLE6 or 3.0.STABLE19 >> Current Beta Squid 3.1.0.13 >> > -- Mr. S. H. Malave MTech, Computer Science & Engineering Dept., Walchand College of Engineering, Sangli. Mob. 9860470739 sachinmal...@wce.org.in