On 11/28/2012 04:07 PM, Henrik Nordström wrote: > - each worker restarting at startup due to rock store DB open failures > (timeout). This seems to settle once the rock store have completed > rebuilding, but not 100% sure that's the cause as logs are a bit > inclusive. Might also be related to the delay due to the kid rebuilding > it's own aufs cache_dir.
IIRC, Rock store diskers should process queries while rebuilding so db open should not fail due to rebuild itself. I bet this is actually related to the problem discussed below. > - kid registration failures at startup (timeout). This seem to be due to > the local cache rebuild in each kid taking more than 6 seconds to > complete, confusing the registration timeout management. The registration code needs to be redesigned. It is not robust enough, especially when we have legacy code that blocks Squid for more than 6 seconds at startup. In general, longer registration delays help in some situations and hurt in others. Consider a case where IPC registration cannot succeed due to a configuration bug or somesuch. It is best for Squid to fail ASAP in that case rather than keep waiting for 10 minutes. > --- squid-3.2.3/src/DiskIO/IpcIo/IpcIoFile.cc 2012-10-20 15:39:49.000000000 > +0300 > +++ squid-3.2.3/src/DiskIO/IpcIo/IpcIoFile.cc 2012-11-29 00:47:11.451848433 > +0200 > @@ -30,7 +30,7 @@ > // TODO: make configurable or compute from squid.conf settings if possible > static const int QueueCapacity = 1024; > > -const double IpcIoFile::Timeout = 7; // seconds; XXX: ALL,9 may require more > +const double IpcIoFile::Timeout = 300; // seconds; XXX: ALL,9 may require > more Do not do that. The timeout is used for both file opening timeout and regular/ongoing I/O timeout checks. The latter should happen more frequently than once in 5 minutes, especially on busy Squids. Split this into OpenTimeout and IoTimeout if you want to increase the former. > IpcIoFile::IpcIoFileList IpcIoFile::WaitingForOpen; > IpcIoFile::IpcIoFilesMap IpcIoFile::IpcIoFiles; > std::auto_ptr<IpcIoFile::Queue> IpcIoFile::queue; > @@ -218,7 +218,7 @@ > { > bool ioError = false; > if (!response) { > - debugs(79, 3, HERE << "error: timeout"); > + debugs(79, 1, HERE << "error: timeout"); Do not do that. It is a common/benign case under overload. > ioError = true; // I/O timeout does not warrant setting error_? > } else { > if (response->xerrno) { > --- squid-3.2.3/src/HttpHeader.cc 2012-10-20 15:39:49.000000000 +0300 > +++ squid-3.2.3/src/HttpHeader.cc 2012-11-21 19:51:23.826807754 +0200 > @@ -548,7 +548,7 @@ > > char *nulpos; > if ((nulpos = (char*)memchr(header_start, '\0', header_end - > header_start))) { > - debugs(55, 1, "WARNING: HTTP header contains NULL characters {" << > + debugs(55, 2, "WARNING: HTTP header contains NULL characters {" << > getStringPrefix(header_start, nulpos) << "}\nNULL\n{" << > getStringPrefix(nulpos+1, header_end)); > goto reset; > } While I agree that Squid should not warn about malformed headers by default, others disagree, and this is unrelated to the rest of the changes. > --- squid-3.2.3/src/ipc/Strand.cc 2012-10-20 15:39:49.000000000 +0300 > +++ squid-3.2.3/src/ipc/Strand.cc 2012-11-29 00:12:12.385439783 +0200 > @@ -53,7 +53,7 @@ > TypedMsgHdr message; > ann.pack(message); > SendMessage(coordinatorAddr, message); > - setTimeout(6, "Ipc::Strand::timeoutHandler"); // TODO: make 6 > configurable? > + setTimeout(600, "Ipc::Strand::timeoutHandler"); // TODO: make 6 > configurable? > } Please keep the TODO comment in sync. This change is discussed at the beginning of this email. In summary, 10 minutes is sometimes worse than 6 seconds (and sometimes is not enough anyway). The registration algorithm should be adjusted, and the code that blocks Squid process for 9 minutes should be fixed. Until then, I am not objecting to this specific change, but perhaps you can add a comment discussing why this is so large. Thank you, Alex.