Re: freebsd-5.4-stable panics

2005-10-15 Thread Rob Watt
On Thu, 13 Oct 2005, Rob Watt wrote: > The test machine did panic. Unfortunately I was not running with > BREAK_TO_DEBUGGER. I will re-run the tests with all of the debugging > options we were using before, and then send you the trace info. Unfortunately I was not able to reproduce the panics wit

Re: freebsd-5.4-stable panics

2005-10-12 Thread Don Lewis
On 12 Oct, Rob Watt wrote: >> >> On Fri, 7 Oct 2005, Don Lewis wrote: >> I MFC'ed the fix to RELENG_6 last week, but the patch didn't apply >> cleanly to RELENG_5. I tweaked the patch for RELENG_5 and tested it on >> a UP box. I'd like to get some testing on SMP hardware before I commit >> it to

Re: freebsd-5.4-stable panics

2005-10-12 Thread Rob Watt
> >> On Fri, 7 Oct 2005, Don Lewis wrote: > I MFC'ed the fix to RELENG_6 last week, but the patch didn't apply > cleanly to RELENG_5. I tweaked the patch for RELENG_5 and tested it on > a UP box. I'd like to get some testing on SMP hardware before I commit > it to RELENG_5, just to make sure that

Re: freebsd-5.4-stable panics

2005-10-11 Thread Don Lewis
On 11 Oct, Rob Watt wrote: > On Mon, 10 Oct 2005, Rob Watt wrote: > >> Don, >> >> On Fri, 7 Oct 2005, Don Lewis wrote: >> >> > Both HEAD and RELENG_6 have been patched. I've tested the following >> > patch for RELENG_5 on a uniprocessor sparc64 box. I'd appreciate it if >> > anyone who was runni

Re: freebsd-5.4-stable panics

2005-10-11 Thread Rob Watt
On Mon, 10 Oct 2005, Rob Watt wrote: > Don, > > On Fri, 7 Oct 2005, Don Lewis wrote: > > > Both HEAD and RELENG_6 have been patched. I've tested the following > > patch for RELENG_5 on a uniprocessor sparc64 box. I'd appreciate it if > > anyone who was running into this problem on RELENG_5 with

Re: freebsd-5.4-stable panics

2005-10-10 Thread Rob Watt
Don, On Fri, 7 Oct 2005, Don Lewis wrote: > Both HEAD and RELENG_6 have been patched. I've tested the following > patch for RELENG_5 on a uniprocessor sparc64 box. I'd appreciate it if > anyone who was running into this problem on RELENG_5 with SMP hardare > could test it before I do the MFC.

Re: freebsd-5.4-stable panics

2005-10-07 Thread Don Lewis
On 3 Oct, Rob Watt wrote: > We noticed the patches from Don Lewis, but have not tested them yet. We > weren't sure if we could just apply those patches against 6.0-BETA5, or > whether we should wait for them to be MFC'd. Both HEAD and RELENG_6 have been patched. I've tested the following patch

Re: freebsd-5.4-stable panics

2005-10-04 Thread Don Lewis
On 3 Oct, Rob Watt wrote: >> It turns out that the sysctl buffer is already wired in one of the two >> cases >> that this function is called, so I moved the wiring up to the upper > layer >> in >> the other case and cut out a bunch of the locking gymnastics as a > result. >> Can you try this patch

re: freebsd-5.4-stable panics

2005-10-04 Thread Rob Watt
> It turns out that the sysctl buffer is already wired in one of the two > cases > that this function is called, so I moved the wiring up to the upper layer > in > the other case and cut out a bunch of the locking gymnastics as a result. > Can you try this patch? > > Index: kern_proc.c > ==

Re: freebsd-5.4-stable panics

2005-10-02 Thread Don Lewis
On 2 Oct, Don Lewis wrote: > It turns out that fill_kinfo_thread() grabs a bunch of locks to grab > things out of struct proc, which breaks badly if sched_lock is grabbed > before calling fill_kinfo_thread(). > > I refactored fill_kinfo_thread() into two functions, one of which > doesn't need an

Re: freebsd-5.4-stable panics

2005-10-02 Thread Don Lewis
On 1 Oct, Don Lewis wrote: > On 30 Sep, John Baldwin wrote: >> It turns out that the sysctl buffer is already wired in one of the two cases >> that this function is called, so I moved the wiring up to the upper layer in >> the other case and cut out a bunch of the locking gymnastics as a result

Re: freebsd-5.4-stable panics

2005-10-01 Thread Don Lewis
On 30 Sep, John Baldwin wrote: > On Friday 30 September 2005 11:25 am, Antoine Pelisse wrote: >> On 9/30/05, John Baldwin <[EMAIL PROTECTED]> wrote: >> > On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote: >> > > Hi Robert, >> > > I don't think your patch is correct, the total linked list

Re: freebsd-5.4-stable panics

2005-10-01 Thread Don Lewis
On 30 Sep, Antoine Pelisse wrote: > Hi Robert, > I don't think your patch is correct, the total linked list can be broken > while the lock is released, thus just passing the link may not be enough > I have submitted a PR[1] for this a month ago but nobody took care of it yet There are two problem

Re: freebsd-5.4-stable panics

2005-10-01 Thread Antoine Pelisse
On 9/30/05, John Baldwin <[EMAIL PROTECTED]> wrote: > > On Friday 30 September 2005 11:25 am, Antoine Pelisse wrote: > > On 9/30/05, John Baldwin <[EMAIL PROTECTED]> wrote: > > > On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote: > > > > Hi Robert, > > > > I don't think your patch is corr

Re: freebsd-5.4-stable panics

2005-09-30 Thread John Baldwin
On Friday 30 September 2005 11:25 am, Antoine Pelisse wrote: > On 9/30/05, John Baldwin <[EMAIL PROTECTED]> wrote: > > On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote: > > > Hi Robert, > > > I don't think your patch is correct, the total linked list can be > > > broken > > > > > > while

Re: freebsd-5.4-stable panics

2005-09-30 Thread John Baldwin
On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote: > Hi Robert, > I don't think your patch is correct, the total linked list can be broken > while the lock is released, thus just passing the link may not be enough > I have submitted a PR[1] for this a month ago but nobody took care of it

Re: freebsd-5.4-stable panics

2005-09-30 Thread Rob Watt
On Thu, 29 Sep 2005, Robert Watson wrote: > Could you dump the contents of *td and *td->td_proc for me? I'm quite > interested to know what the value in td->td_proc->p_state is, among other > things. If I could also have you generate a dump of the KSE group > structures in td->td_proc->p_ksegrps

Re: freebsd-5.4-stable panics

2005-09-30 Thread Rob Watt
Robert, We have gotten some more information from our type1 crash: >sh lockedvnods Locked vnodes >sh alllocks Process 2204 (dataplay) thread 0xff00b1726a000 (100214) exclusive sleep mutex inp (udpinp) f = 0 (0xff00cc90fcc8) locked @ /usr/src/sys/netinet/udp_usrreq.c:762 Process 62 (paged

Re: freebsd-5.4-stable panics

2005-09-29 Thread Robert Watson
On Thu, 29 Sep 2005, Rob Watt wrote: On Thu, 29 Sep 2005, Robert Watson wrote: Could you dump the contents of *td and *td->td_proc for me? I'm quite interested to know what the value in td->td_proc->p_state is, among other things. If I could also have you generate a dump of the KSE group str

Re: freebsd-5.4-stable panics

2005-09-29 Thread Robert Watson
On Wed, 28 Sep 2005, Rob Watt wrote: We re-compiled the kernel with 'options KDB_STOP_NMI', and were able to get a much more full analysis of what was happening on the 6-BETA5 crash. Great. We crashed in top again, and it does look like we may have hit a kern_proc bug. This sounds good,

Re: freebsd-5.4-stable panics

2005-09-29 Thread Rob Watt
Robert, On Tue, 27 Sep 2005, Robert Watson wrote: > Great. As mentioned I'll be offline for about the next 48 hours, but back > after then. If we can get a nice clean crash out of this, would really be > best. If it's top panicking, it could well be due to a bug in the process > monitoring cod

Re: freebsd-5.4-stable panics

2005-09-28 Thread Rob Watt
On Sun, 25 Sep 2005, Robert Watson wrote: > > On Fri, 23 Sep 2005, Jason Carroll wrote: > 5B > > There seem to be 2 types of crashes we see with pretty different stack > > traces. What I'll call a type 1 crash, I believe, is often caused by > > one of the triggers I mention above. A type 2 crash

Re: freebsd-5.4-stable panics

2005-09-28 Thread Rob Watt
On 9/27/05, Robert Watson <[EMAIL PROTECTED]> wrote: > > On Tue, 27 Sep 2005, Rob Watt wrote: > > > Is this an SMP box? If so, could you try compiling options KDB_STOP_NMI > into your kernel -- you'll also need to set debug.kdb.stop_cpus_with_nmi=1 > in either loader.conf or at runtime with sysctl

Re: freebsd-5.4-stable panics

2005-09-27 Thread Robert Watson
On Tue, 27 Sep 2005, Rob Watt wrote: this is the piece of code that was referenced by the ip: (gdb) l *0x803b88ca 0x803b88ca is in nfsrv_lookup (/usr/src/sys/nfsserver/nfs_serv.c:670). 665 NFSD_UNLOCK(); 666 mtx_lock(&Giant); /* VFS */ 667

Re: freebsd-5.4-stable panics

2005-09-27 Thread Robert Watson
On Tue, 27 Sep 2005, Rob Watt wrote: Thanks for your quick response and suggestions. We have now experienced an additional type of crash. Type 3 is from 6.0-BETA5, it did not enter the debugger at all and we could not generate a core. Is this an SMP box? If so, could you try compiling optio

Re: freebsd-5.4-stable panics

2005-09-25 Thread Robert Watson
On Fri, 23 Sep 2005, Jason Carroll wrote: 5B There seem to be 2 types of crashes we see with pretty different stack traces. What I'll call a type 1 crash, I believe, is often caused by one of the triggers I mention above. A type 2 crash appears to happen spontaneously after the machine has b