Re: Squid-3.2 status update

Amos Jeffries Wed, 04 Jul 2012 16:34:51 -0700

On 05.07.2012 10:00, Alex Rousskov wrote:

On 06/27/2012 03:12 AM, Amos Jeffries wrote:
A quick review of the other major bugs shows that each will takesome
large design and code changes to implement a proper fix or even a
workaround.
Are there any objections to ignoring these bugs when considering a3.2
stable release:
Our definition of a "stable release" has two criteria:

1. "Meant for production caches."

2. "begin when all known major bugs have been fixed [for 14 days]."
Criterion #1 should probably be interpreted as "Squid Projectconsiders
the version suitable for production deployment". If you think we are
there, I have no objections -- I do not have enough information tosay
whether enough users will be satisfied with current v3.2 code in
production today. Perhaps this is something we should ask onsquid-users
after we close all bugs that we think should be closed?
As for Criteria #2, your question means that either we stopconsidering
those bugs as major OR we change criterion #2. IMHO, we should adjust
that criterion so that we do not have to play these games where wemark
something as a major bug but then decide that in the interest of a
speedier "stable" designation we are going to "ignore" it.

An adjusted initialization criteria could be phrased as

2'.  "begin when #1 is satisfied for at least 14 days"


This gives us enough flexibility to release [what we consider
suitable-for-production] code that might have major bugs in some
environments. I added "at least" because otherwise we may have to
release v3.3 as stable 14 days after v3.2 is marked stable :-). In
practice, the version should have "enough improvements" to warrantitsnumbering and its release but I do not want to digress in thatdiscussion.
3124 - Cache manager stops responding when multiple workers used
** requires implementing non-blocking IPC packets between workersand
coordinator.
Has this been discussed somewhere? IPC communication is already
non-blocking so I suspect some other issue is at play here. Thespecific
examples of mgr commands in the bug report (userhash, sourcehash,
client_list, and netdb) seem like non-essential in most environments
and, hence, not justifying the "major" designation, but perhaps they
indicate some major implementation problem that must be fixed.

UNIX sockets apparently guarantee the write() is blocked untilrecipient process has read() the packet. Meaning each IPC packet isblocked behind whatever longer AsyncCall or delay the recipient hasgoing on. Last I looked the coordinator handling function also calledcomponent handler functions synchronously for them to create theresponse IPC packet.

AFAIK this is waiting on the Subscription and generic (immediate-ACK)IPC packets, which will free up the coordinator and workers for otherasync operations even if a large process is underway.

3389 - Auto-reconnect for tcp access_log
** requires asynchronous handling of log opening and blockingSquid
operation
Since we have stable file-based logging, this bug does not have toblock
a "stable" designation if TCP logging is declared "experimental". You
already have a patch that addresses 90% of the core problem for those
who care.

If you do not want to mark TCP logging as experimental and highlight
this shortcoming, then the bug ought to be fixed IMHO because thereis
consensus that accurate logging is critical for many deployments.
3478 - Host verify catching dynamic CDN hosted sites
  ** requires designing a CONNECT and bump handling mechanism
I am not an expert on this, but it feels like we are trying toenforce a[good] rule ignored by the [bad] real world, especially ininterception
environments. As a result, Squid lies and scares admins for no good
reason (in most cases). We will not win this battle.

I suggest that the "host_verify_strict off" behavior is adjusted to
cause no harm, even if some malicious requests will get through.

It does that now. The "no harm" means we can't re-write the requestheaders to something we are not sure about and would actively causeproblems if we got it wrong.The current state is that Squid goes DIRECT, instead of through peers.Breaking interception+cluster setups.

I can open that up again, but it will mean updating the CVE to indicate2nd-stage proxies are still vulnerable.

If you do not want to do that, please add a [fast] ACL so that admins
are not stuck without a solution and can whitelist bad (or all)sites.
Said that, the bug report itself does not explicitly say thatsomething
is _seriously_ broken, does it? I bet the cache.log messages are
excessive on any busy site with a diverse user population, but we can
rate-limit these messages and downgrade the severity of the bug while
waiting for a real use case where these new checks break things(despite
host_verify_strict being off).

cache_peer relay is almost completely "disabled" for some major sites.Everything else works well.

3517 - Workers ldap digest
  ** requires SMP atomic access support for all user credentials


This is not a blocker IMO. SMP has several known limitations, complex

authentication schemes being one of them. This does not affectstability

of supported SMP configurations.


Okay, thank you.

Which would leave us with only these to locate (any takers?) :

3551 - store_rebuild.cc:116: "store_errors == 0" assertion
It would be nice to figure this one out, at least for ufs, becausemanyfolks will try ufs with SMP and there is clearly some kind ofcorruption
problem there. I assigned the bug to self for now.

However, if I cannot reproduce it, I will not be able to make much
progress. Please note that the original reported moved on to rockstoreand does not consider this bug to be affecting him any more (percomment
#10).
3556 - assertion failed: comm.cc:1093: "isOpen(fd)"
I recommend adding a guard for the comm_close() call in theConnection
destructor to avoid the call for !isOpen(fd) orphan connections. And
print the value of isOpen() in the BUG message.


Aha.

3562 - StoreEntry::kickProducer Segmentation fault
I suspect Squid is corrupting its own memory somewhere so thisspecificcore dump cannot be trusted. This might even be the same problem asbug
3551 above. This could be considered a blocker at least until we know
more, I guess.


Thank you.

Amos

Re: Squid-3.2 status update

Reply via email to