On Thu, 13 Oct 2005, Rob Watt wrote:
> The test machine did panic. Unfortunately I was not running with
> BREAK_TO_DEBUGGER. I will re-run the tests with all of the debugging
> options we were using before, and then send you the trace info.
Unfortunately I was not able to reproduce the panics wit
On 12 Oct, Rob Watt wrote:
>> >> On Fri, 7 Oct 2005, Don Lewis wrote:
>> I MFC'ed the fix to RELENG_6 last week, but the patch didn't apply
>> cleanly to RELENG_5. I tweaked the patch for RELENG_5 and tested it on
>> a UP box. I'd like to get some testing on SMP hardware before I commit
>> it to
> >> On Fri, 7 Oct 2005, Don Lewis wrote:
> I MFC'ed the fix to RELENG_6 last week, but the patch didn't apply
> cleanly to RELENG_5. I tweaked the patch for RELENG_5 and tested it on
> a UP box. I'd like to get some testing on SMP hardware before I commit
> it to RELENG_5, just to make sure that
On 11 Oct, Rob Watt wrote:
> On Mon, 10 Oct 2005, Rob Watt wrote:
>
>> Don,
>>
>> On Fri, 7 Oct 2005, Don Lewis wrote:
>>
>> > Both HEAD and RELENG_6 have been patched. I've tested the following
>> > patch for RELENG_5 on a uniprocessor sparc64 box. I'd appreciate it if
>> > anyone who was runni
On Mon, 10 Oct 2005, Rob Watt wrote:
> Don,
>
> On Fri, 7 Oct 2005, Don Lewis wrote:
>
> > Both HEAD and RELENG_6 have been patched. I've tested the following
> > patch for RELENG_5 on a uniprocessor sparc64 box. I'd appreciate it if
> > anyone who was running into this problem on RELENG_5 with
Don,
On Fri, 7 Oct 2005, Don Lewis wrote:
> Both HEAD and RELENG_6 have been patched. I've tested the following
> patch for RELENG_5 on a uniprocessor sparc64 box. I'd appreciate it if
> anyone who was running into this problem on RELENG_5 with SMP hardare
> could test it before I do the MFC.
On 3 Oct, Rob Watt wrote:
> We noticed the patches from Don Lewis, but have not tested them yet. We
> weren't sure if we could just apply those patches against 6.0-BETA5, or
> whether we should wait for them to be MFC'd.
Both HEAD and RELENG_6 have been patched. I've tested the following
patch
On 3 Oct, Rob Watt wrote:
>> It turns out that the sysctl buffer is already wired in one of the two
>> cases
>> that this function is called, so I moved the wiring up to the upper
> layer
>> in
>> the other case and cut out a bunch of the locking gymnastics as a
> result.
>> Can you try this patch
> It turns out that the sysctl buffer is already wired in one of the two
> cases
> that this function is called, so I moved the wiring up to the upper
layer
> in
> the other case and cut out a bunch of the locking gymnastics as a
result.
> Can you try this patch?
>
> Index: kern_proc.c
> ==
On 2 Oct, Don Lewis wrote:
> It turns out that fill_kinfo_thread() grabs a bunch of locks to grab
> things out of struct proc, which breaks badly if sched_lock is grabbed
> before calling fill_kinfo_thread().
>
> I refactored fill_kinfo_thread() into two functions, one of which
> doesn't need an
On 1 Oct, Don Lewis wrote:
> On 30 Sep, John Baldwin wrote:
>> It turns out that the sysctl buffer is already wired in one of the two cases
>> that this function is called, so I moved the wiring up to the upper layer in
>> the other case and cut out a bunch of the locking gymnastics as a result
On 30 Sep, John Baldwin wrote:
> On Friday 30 September 2005 11:25 am, Antoine Pelisse wrote:
>> On 9/30/05, John Baldwin <[EMAIL PROTECTED]> wrote:
>> > On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote:
>> > > Hi Robert,
>> > > I don't think your patch is correct, the total linked list
On 30 Sep, Antoine Pelisse wrote:
> Hi Robert,
> I don't think your patch is correct, the total linked list can be broken
> while the lock is released, thus just passing the link may not be enough
> I have submitted a PR[1] for this a month ago but nobody took care of it yet
There are two problem
On 9/30/05, John Baldwin <[EMAIL PROTECTED]> wrote:
>
> On Friday 30 September 2005 11:25 am, Antoine Pelisse wrote:
> > On 9/30/05, John Baldwin <[EMAIL PROTECTED]> wrote:
> > > On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote:
> > > > Hi Robert,
> > > > I don't think your patch is corr
On Friday 30 September 2005 11:25 am, Antoine Pelisse wrote:
> On 9/30/05, John Baldwin <[EMAIL PROTECTED]> wrote:
> > On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote:
> > > Hi Robert,
> > > I don't think your patch is correct, the total linked list can be
> > > broken
> > >
> > > while
On Friday 30 September 2005 05:24 am, Antoine Pelisse wrote:
> Hi Robert,
> I don't think your patch is correct, the total linked list can be broken
> while the lock is released, thus just passing the link may not be enough
> I have submitted a PR[1] for this a month ago but nobody took care of it
On Thu, 29 Sep 2005, Robert Watson wrote:
> Could you dump the contents of *td and *td->td_proc for me? I'm quite
> interested to know what the value in td->td_proc->p_state is, among other
> things. If I could also have you generate a dump of the KSE group
> structures in td->td_proc->p_ksegrps
Robert,
We have gotten some more information from our type1 crash:
>sh lockedvnods
Locked vnodes
>sh alllocks
Process 2204 (dataplay) thread 0xff00b1726a000 (100214)
exclusive sleep mutex inp (udpinp) f = 0 (0xff00cc90fcc8) locked @
/usr/src/sys/netinet/udp_usrreq.c:762
Process 62 (paged
On Thu, 29 Sep 2005, Rob Watt wrote:
On Thu, 29 Sep 2005, Robert Watson wrote:
Could you dump the contents of *td and *td->td_proc for me? I'm quite
interested to know what the value in td->td_proc->p_state is, among other
things. If I could also have you generate a dump of the KSE group
str
On Wed, 28 Sep 2005, Rob Watt wrote:
We re-compiled the kernel with 'options KDB_STOP_NMI', and were able to
get a much more full analysis of what was happening on the 6-BETA5
crash.
Great.
We crashed in top again, and it does look like we may have hit a
kern_proc bug.
This sounds good,
Robert,
On Tue, 27 Sep 2005, Robert Watson wrote:
> Great. As mentioned I'll be offline for about the next 48 hours, but back
> after then. If we can get a nice clean crash out of this, would really be
> best. If it's top panicking, it could well be due to a bug in the process
> monitoring cod
On Sun, 25 Sep 2005, Robert Watson wrote:
>
> On Fri, 23 Sep 2005, Jason Carroll wrote:
> 5B
> > There seem to be 2 types of crashes we see with pretty different stack
> > traces. What I'll call a type 1 crash, I believe, is often caused by
> > one of the triggers I mention above. A type 2 crash
On 9/27/05, Robert Watson <[EMAIL PROTECTED]> wrote:
>
> On Tue, 27 Sep 2005, Rob Watt wrote:
>
>
> Is this an SMP box? If so, could you try compiling options KDB_STOP_NMI
> into your kernel -- you'll also need to set debug.kdb.stop_cpus_with_nmi=1
> in either loader.conf or at runtime with sysctl
On Tue, 27 Sep 2005, Rob Watt wrote:
this is the piece of code that was referenced by the ip:
(gdb) l *0x803b88ca
0x803b88ca is in nfsrv_lookup (/usr/src/sys/nfsserver/nfs_serv.c:670).
665 NFSD_UNLOCK();
666 mtx_lock(&Giant); /* VFS */
667
On Tue, 27 Sep 2005, Rob Watt wrote:
Thanks for your quick response and suggestions. We have now experienced
an additional type of crash. Type 3 is from 6.0-BETA5, it did not enter
the debugger at all and we could not generate a core.
Is this an SMP box? If so, could you try compiling optio
On Fri, 23 Sep 2005, Jason Carroll wrote:
5B
There seem to be 2 types of crashes we see with pretty different stack
traces. What I'll call a type 1 crash, I believe, is often caused by
one of the triggers I mention above. A type 2 crash appears to happen
spontaneously after the machine has b
26 matches
Mail list logo