On 01/21/15 09:51, Konstantin Belousov wrote:
On Wed, Jan 21, 2015 at 09:32:11AM +0100, Hans Petter Selasky wrote:
On 01/21/15 00:53, Sean Bruno wrote:
Unkown to me.  Nor am I aware of anyone else who ever hit our panics
either.  Our environment, and the failure, was only seen in the Intel
10GE space (ixgbe).  This is an artifact of our use cases, and hasn't
been expanded nor tested in our environment with other vendor interfaces.

sean


Hi,


I've seen this with Mellanox hardware when running some special tests,
but not during regular use yet. That was the reason for going into the
callout subsystem in the first place. 40GE.

Also I would like to mention during the heat of this discussion, that
during X-mas this year, I had a very heavy discussion with Attilio and a
few other FreeBSD developers, who's name was on a patch (r220456) that
changed how the return value of "callout_active()" works.
"callout_active()" is heavily used inside the TCP stack and what was
found is there is a potential race related to migrating the callout from
one CPU to the other, which in turn might give other symptoms than a
spinlock hang.

FYI:

https://svnweb.freebsd.org/base?view=revision&revision=225057

Cite: "If the newly scheduled thread wants to acquire the old queue it
will just spin forever."

This description reminds me very much of what "Jason Wolfe", others and
myself have seen.

Konstantin, you're responsible for r220456 (Approved by: kib). I would
I definitely do not see anything related to my freefall login in the
log message for r220456, nor I participated in any way in the work
which lead to that revision.

If you mean r225057, note that approval by re != review.

Yes, I meant r225057.

like to ask what investigation you did to ensure that you solved the
problem as described in the commit message and didn't introduce a new one?

In r220456 the "callout_reset_on()" function was changed in a way that
directly conflicts with how the TCP stack works, by not always ensuring
that "callout_active()" returns non-zero after a callout is restarted!
See return at line 821:

https://svnweb.freebsd.org/base/head/sys/kern/kern_timeout.c?revision=225057&view=markup&pathrev=225057#l821

Kib: Any comments?

With the re hat on, explanation for the proposed commit looked reasonable,
and committer provided enough evidence that change got adequate testing.
Since change fixed a bug, and this is exactly what re wants to see
during release cycle, I see no reason why commit should be denied.

The problem is Attilio is no longer an active committer and he was not been very willing to do more work in this area. When people writing code in an area no longer respond - what should we do?

--HPS
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to