Re: [PATCH] posix-timer: fix deletion race

2007-07-24 Thread Jeremy Katz
On Wed, 25 Jul 2007, Oleg Nesterov wrote: On 07/24, Jeremy Katz wrote: Sorry. That should have been "without apparent effect". Sorry. I confused completely. So. You mean that even with that patch you _still_ see the BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free() ? Yes. I did not notice

Re: [PATCH] posix-timer: fix deletion race

2007-07-24 Thread Oleg Nesterov
On 07/24, Jeremy Katz wrote: > > On Tue, 24 Jul 2007, Oleg Nesterov wrote: > > >Interesting. Could you show the patch? Where does sys_timer_create() set > >counter == 1? > > --- kernel/posix-timers.c.old 2007-07-24 11:21:29.0 -0700 > +++ kernel/posix-timers.c 2007-07-20

Re: [PATCH] posix-timer: fix deletion race

2007-07-24 Thread Jeremy Katz
On Tue, 24 Jul 2007, Oleg Nesterov wrote: On 07/23, Jeremy Katz wrote: On Fri, 20 Jul 2007, Oleg Nesterov wrote: I still can't believe we have a double-free problem, this looks imposiible. Do you see the "idr_remove called for id=%d which is not allocated.\n" in syslog? No. I

Re: [PATCH] posix-timer: fix deletion race

2007-07-24 Thread Oleg Nesterov
On 07/23, Jeremy Katz wrote: > > On Fri, 20 Jul 2007, Oleg Nesterov wrote: > > >I still can't believe we have a double-free problem, this looks imposiible. > >Do you see the > > > > "idr_remove called for id=%d which is not allocated.\n" > > > >in syslog? > > No. I also added some

Re: [PATCH] posix-timer: fix deletion race

2007-07-24 Thread Oleg Nesterov
On 07/23, Jeremy Katz wrote: On Fri, 20 Jul 2007, Oleg Nesterov wrote: I still can't believe we have a double-free problem, this looks imposiible. Do you see the idr_remove called for id=%d which is not allocated.\n in syslog? No. I also added some accounting with atomic

Re: [PATCH] posix-timer: fix deletion race

2007-07-24 Thread Jeremy Katz
On Tue, 24 Jul 2007, Oleg Nesterov wrote: On 07/23, Jeremy Katz wrote: On Fri, 20 Jul 2007, Oleg Nesterov wrote: I still can't believe we have a double-free problem, this looks imposiible. Do you see the idr_remove called for id=%d which is not allocated.\n in syslog? No. I

Re: [PATCH] posix-timer: fix deletion race

2007-07-24 Thread Oleg Nesterov
On 07/24, Jeremy Katz wrote: On Tue, 24 Jul 2007, Oleg Nesterov wrote: Interesting. Could you show the patch? Where does sys_timer_create() set counter == 1? --- kernel/posix-timers.c.old 2007-07-24 11:21:29.0 -0700 +++ kernel/posix-timers.c 2007-07-20

Re: [PATCH] posix-timer: fix deletion race

2007-07-24 Thread Jeremy Katz
On Wed, 25 Jul 2007, Oleg Nesterov wrote: On 07/24, Jeremy Katz wrote: Sorry. That should have been without apparent effect. Sorry. I confused completely. So. You mean that even with that patch you _still_ see the BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free() ? Yes. I did not notice

Re: [PATCH] posix-timer: fix deletion race

2007-07-23 Thread Jeremy Katz
On Fri, 20 Jul 2007, Oleg Nesterov wrote: On 07/18, Jeremy Katz wrote: On Wed, 18 Jul 2007, Oleg Nesterov wrote: Jeremy, I agree with Thomas that your patch should not be right, but it does make a difference. Perhaps this is just the timing, but who knows. Could you add some printk's to be

Re: [PATCH] posix-timer: fix deletion race

2007-07-23 Thread Jeremy Katz
On Fri, 20 Jul 2007, Oleg Nesterov wrote: On 07/18, Jeremy Katz wrote: On Wed, 18 Jul 2007, Oleg Nesterov wrote: Jeremy, I agree with Thomas that your patch should not be right, but it does make a difference. Perhaps this is just the timing, but who knows. Could you add some printk's to be

Re: [PATCH] posix-timer: fix deletion race

2007-07-20 Thread Oleg Nesterov
On 07/18, Jeremy Katz wrote: > > On Wed, 18 Jul 2007, Oleg Nesterov wrote: > > >Jeremy, I agree with Thomas that your patch should not be right, but it > >does make a difference. Perhaps this is just the timing, but who knows. > >Could you add some printk's to be sure that lock_timer() actually

Re: [PATCH] posix-timer: fix deletion race

2007-07-20 Thread Oleg Nesterov
On 07/18, Jeremy Katz wrote: On Wed, 18 Jul 2007, Oleg Nesterov wrote: Jeremy, I agree with Thomas that your patch should not be right, but it does make a difference. Perhaps this is just the timing, but who knows. Could you add some printk's to be sure that lock_timer() actually fails

Re: [PATCH] posix-timer: fix deletion race

2007-07-19 Thread Jeremy Katz
On Thu, 19 Jul 2007, Thomas Gleixner wrote: On Wed, 2007-07-18 at 16:43 -0700, Jeremy Katz wrote: On Wed, 18 Jul 2007, Jeremy Katz wrote: On Wed, 18 Jul 2007, Thomas Gleixner wrote: Also can you please enable CONFIG_PROVE_LOCKING, which might catch any locking problem, which might be

Re: [PATCH] posix-timer: fix deletion race

2007-07-19 Thread Thomas Gleixner
On Wed, 2007-07-18 at 16:43 -0700, Jeremy Katz wrote: > On Wed, 18 Jul 2007, Jeremy Katz wrote: > > > On Wed, 18 Jul 2007, Thomas Gleixner wrote: > > > >>> Also can you please enable CONFIG_PROVE_LOCKING, which might catch any > >>> locking problem, which might be related to this. > >> > >>

Re: [PATCH] posix-timer: fix deletion race

2007-07-19 Thread Thomas Gleixner
On Wed, 2007-07-18 at 16:43 -0700, Jeremy Katz wrote: On Wed, 18 Jul 2007, Jeremy Katz wrote: On Wed, 18 Jul 2007, Thomas Gleixner wrote: Also can you please enable CONFIG_PROVE_LOCKING, which might catch any locking problem, which might be related to this. Another test: Can you

Re: [PATCH] posix-timer: fix deletion race

2007-07-19 Thread Jeremy Katz
On Thu, 19 Jul 2007, Thomas Gleixner wrote: On Wed, 2007-07-18 at 16:43 -0700, Jeremy Katz wrote: On Wed, 18 Jul 2007, Jeremy Katz wrote: On Wed, 18 Jul 2007, Thomas Gleixner wrote: Also can you please enable CONFIG_PROVE_LOCKING, which might catch any locking problem, which might be

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Jeremy Katz
On Wed, 18 Jul 2007, Jeremy Katz wrote: On Wed, 18 Jul 2007, Thomas Gleixner wrote: Also can you please enable CONFIG_PROVE_LOCKING, which might catch any locking problem, which might be related to this. Another test: Can you please disable CONFIG_SCHED_SMT to narrow it down further ?

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Jeremy Katz
On Wed, 18 Jul 2007, Oleg Nesterov wrote: Jeremy, I agree with Thomas that your patch should not be right, but it does make a difference. Perhaps this is just the timing, but who knows. Could you add some printk's to be sure that lock_timer() actually fails while it never should? Agreed.

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Jeremy Katz
On Wed, 18 Jul 2007, Thomas Gleixner wrote: On Wed, 2007-07-18 at 08:05 +0200, Thomas Gleixner wrote: On Tue, 2007-07-17 at 16:58 -0700, Jeremy Katz wrote: EFLAGS: 00010246 (2.6.22.1-WR1.4aq_cgl #2) Hmm. Are there any other patches on that kernel ? Just hrt6 and your proposed fix. The

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Oleg Nesterov
On 07/17, Jeremy Katz wrote: > > This is with the patch (and 2.6.22.1 and hrt6): > > [ cut here ] > Kernel BUG at c0125adb [verbose debug info unavailable] > invalid opcode: [#1] > SMP > Modules linked in: > CPU:3 > EIP:0060:[]Not tainted VLI > EFLAGS:

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Thomas Gleixner
On Wed, 2007-07-18 at 08:05 +0200, Thomas Gleixner wrote: > On Tue, 2007-07-17 at 16:58 -0700, Jeremy Katz wrote: > > > Scratch that. I had infrastructure problems, and ended up using the wrong > > build. > > > EFLAGS: 00010246 (2.6.22.1-WR1.4aq_cgl #2) > > Hmm. Are there any other patches

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Thomas Gleixner
On Tue, 2007-07-17 at 16:58 -0700, Jeremy Katz wrote: > Scratch that. I had infrastructure problems, and ended up using the wrong > build. > EFLAGS: 00010246 (2.6.22.1-WR1.4aq_cgl #2) Hmm. Are there any other patches on that kernel ? Is there a chance that you can whip up a test program

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Thomas Gleixner
On Tue, 2007-07-17 at 16:58 -0700, Jeremy Katz wrote: Scratch that. I had infrastructure problems, and ended up using the wrong build. EFLAGS: 00010246 (2.6.22.1-WR1.4aq_cgl #2) Hmm. Are there any other patches on that kernel ? Is there a chance that you can whip up a test program which

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Thomas Gleixner
On Wed, 2007-07-18 at 08:05 +0200, Thomas Gleixner wrote: On Tue, 2007-07-17 at 16:58 -0700, Jeremy Katz wrote: Scratch that. I had infrastructure problems, and ended up using the wrong build. EFLAGS: 00010246 (2.6.22.1-WR1.4aq_cgl #2) Hmm. Are there any other patches on that

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Oleg Nesterov
On 07/17, Jeremy Katz wrote: This is with the patch (and 2.6.22.1 and hrt6): [ cut here ] Kernel BUG at c0125adb [verbose debug info unavailable] invalid opcode: [#1] SMP Modules linked in: CPU:3 EIP:0060:[c0125adb]Not tainted VLI EFLAGS:

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Jeremy Katz
On Wed, 18 Jul 2007, Thomas Gleixner wrote: On Wed, 2007-07-18 at 08:05 +0200, Thomas Gleixner wrote: On Tue, 2007-07-17 at 16:58 -0700, Jeremy Katz wrote: EFLAGS: 00010246 (2.6.22.1-WR1.4aq_cgl #2) Hmm. Are there any other patches on that kernel ? Just hrt6 and your proposed fix. The

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Jeremy Katz
On Wed, 18 Jul 2007, Oleg Nesterov wrote: Jeremy, I agree with Thomas that your patch should not be right, but it does make a difference. Perhaps this is just the timing, but who knows. Could you add some printk's to be sure that lock_timer() actually fails while it never should? Agreed.

Re: [PATCH] posix-timer: fix deletion race

2007-07-18 Thread Jeremy Katz
On Wed, 18 Jul 2007, Jeremy Katz wrote: On Wed, 18 Jul 2007, Thomas Gleixner wrote: Also can you please enable CONFIG_PROVE_LOCKING, which might catch any locking problem, which might be related to this. Another test: Can you please disable CONFIG_SCHED_SMT to narrow it down further ?

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Jeremy Katz
On Tue, 17 Jul 2007, Jeremy Katz wrote: On Tue, 17 Jul 2007, Thomas Gleixner wrote: With 2.6.14 or with current mainline ? I haven't been keeping notes quite as studiously as I should have been, but this just occurred with 2.6.22.1 + the hrt6 patch + your proposed fix: Scratch that. I

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Jeremy Katz
On Tue, 17 Jul 2007, Thomas Gleixner wrote: On Tue, 2007-07-17 at 11:39 -0700, Jeremy Katz wrote: I tried the patch with my test case, but still see the issue. Here's my explanation of the double free race: CPU 0 CPU 1 sys_timer_delete(): lock_timer();

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Thomas Gleixner
On Tue, 2007-07-17 at 11:39 -0700, Jeremy Katz wrote: > I tried the patch with my test case, but still see the issue. > Here's my explanation of the double free race: > CPU 0 CPU 1 > sys_timer_delete(): > lock_timer(); > ... > unlock_timer();

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Jeremy Katz
On Tue, 17 Jul 2007, Thomas Gleixner wrote: Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is caused by a subtle race, which is there since the original posix timer commit and persists until today. timer_delete does: lock_timer(); timer->it_process = NULL; unlock_timer();

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Jeremy Katz
On Tue, 17 Jul 2007, Ingo Molnar wrote: nice one! The race looks pretty narrow - Jeremy, does your Xens have hyperthreading? (or are there any heavy SMI sources perhaps that could open up this race.) If not then there might be some other bug lurking in there as well. Affirmative. 2 cores, 2

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Thomas Gleixner
On Tue, 2007-07-17 at 17:07 +0400, Oleg Nesterov wrote: > I think we can make a simpler patch, > > --- posix-timers.c~ 2007-06-29 14:45:04.0 +0400 > +++ posix-timers.c2007-07-17 16:59:45.0 +0400 > @@ -449,6 +449,9 @@ static void release_posix_timer(struct k >

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Oleg Nesterov
On 07/17, Thomas Gleixner wrote: > > Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is > caused by a subtle race, which is there since the original posix timer > commit and persists until today. > > timer_delete does: > lock_timer(); > timer->it_process = NULL; >

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Ingo Molnar
* Thomas Gleixner <[EMAIL PROTECTED]> wrote: > Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is > caused by a subtle race, which is there since the original posix timer > commit and persists until today. > > timer_delete does: > lock_timer(); > timer->it_process = NULL; >

[PATCH] posix-timer: fix deletion race

2007-07-17 Thread Thomas Gleixner
Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is caused by a subtle race, which is there since the original posix timer commit and persists until today. timer_delete does: lock_timer(); timer->it_process = NULL; unlock_timer(); release_posix_timer(); timer->it_process is

[PATCH] posix-timer: fix deletion race

2007-07-17 Thread Thomas Gleixner
Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is caused by a subtle race, which is there since the original posix timer commit and persists until today. timer_delete does: lock_timer(); timer-it_process = NULL; unlock_timer(); release_posix_timer(); timer-it_process is

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Ingo Molnar
* Thomas Gleixner [EMAIL PROTECTED] wrote: Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is caused by a subtle race, which is there since the original posix timer commit and persists until today. timer_delete does: lock_timer(); timer-it_process = NULL;

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Oleg Nesterov
On 07/17, Thomas Gleixner wrote: Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is caused by a subtle race, which is there since the original posix timer commit and persists until today. timer_delete does: lock_timer(); timer-it_process = NULL; unlock_timer();

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Thomas Gleixner
On Tue, 2007-07-17 at 17:07 +0400, Oleg Nesterov wrote: I think we can make a simpler patch, --- posix-timers.c~ 2007-06-29 14:45:04.0 +0400 +++ posix-timers.c2007-07-17 16:59:45.0 +0400 @@ -449,6 +449,9 @@ static void release_posix_timer(struct k

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Jeremy Katz
On Tue, 17 Jul 2007, Ingo Molnar wrote: nice one! The race looks pretty narrow - Jeremy, does your Xens have hyperthreading? (or are there any heavy SMI sources perhaps that could open up this race.) If not then there might be some other bug lurking in there as well. Affirmative. 2 cores, 2

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Jeremy Katz
On Tue, 17 Jul 2007, Thomas Gleixner wrote: Jeremy Katz experienced a posix-timer related bug on 2.6.14. This is caused by a subtle race, which is there since the original posix timer commit and persists until today. timer_delete does: lock_timer(); timer-it_process = NULL; unlock_timer();

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Thomas Gleixner
On Tue, 2007-07-17 at 11:39 -0700, Jeremy Katz wrote: I tried the patch with my test case, but still see the issue. Here's my explanation of the double free race: CPU 0 CPU 1 sys_timer_delete(): lock_timer(); ... unlock_timer();

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Jeremy Katz
On Tue, 17 Jul 2007, Thomas Gleixner wrote: On Tue, 2007-07-17 at 11:39 -0700, Jeremy Katz wrote: I tried the patch with my test case, but still see the issue. Here's my explanation of the double free race: CPU 0 CPU 1 sys_timer_delete(): lock_timer();

Re: [PATCH] posix-timer: fix deletion race

2007-07-17 Thread Jeremy Katz
On Tue, 17 Jul 2007, Jeremy Katz wrote: On Tue, 17 Jul 2007, Thomas Gleixner wrote: With 2.6.14 or with current mainline ? I haven't been keeping notes quite as studiously as I should have been, but this just occurred with 2.6.22.1 + the hrt6 patch + your proposed fix: Scratch that. I