Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-02 Thread Peter LaDow
On a separate note, I want to thank everyone that helped with this issue, especially Eric and Thomas, to and Steven and Thomas for schooling me on the changelog extraction. This problem was a big one for us that we were struggling to understand. All the help is greatly appreciated. Thanks, Pete

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-02 Thread Peter LaDow
On Thu, Nov 1, 2012 at 5:36 PM, Peter LaDow wrote: > I'm have a setup running 3.0.48-rt72. It's been running about 8 hours > so far, and tomorrow I'll know if there's been any problems. I'm > confident things will be fine tomorrow, and at that time I'll be glad > to attach a Tested

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-02 Thread Peter LaDow
On Thu, Nov 1, 2012 at 5:36 PM, Peter LaDow pet...@gocougs.wsu.edu wrote: I'm have a setup running 3.0.48-rt72. It's been running about 8 hours so far, and tomorrow I'll know if there's been any problems. I'm confident things will be fine tomorrow, and at that time I'll be glad to attach

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-02 Thread Peter LaDow
On a separate note, I want to thank everyone that helped with this issue, especially Eric and Thomas, to and Steven and Thomas for schooling me on the changelog extraction. This problem was a big one for us that we were struggling to understand. All the help is greatly appreciated. Thanks, Pete

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-01 Thread Peter LaDow
> git log v3.0.36-rt58..3.0.48-rt72 > > That's what a source version control system is designed for AFAICT Thanks for the tip. I (naively) presumed there were published changelogs and was looking for them. Nor did I know the git logs were limited to releases, and didn't look there because I

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-01 Thread Peter LaDow
On Tue, Oct 30, 2012 at 5:33 PM, Steven Rostedt wrote: > From: Thomas Gleixner > > The netfilter code relies only on the implicit semantics of > local_bh_disable() for serializing wt_write_recseq sections. RT breaks > that and needs explicit serialization here. > > Rep

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-01 Thread Peter LaDow
On Thu, Nov 1, 2012 at 2:26 PM, Thomas Gleixner wrote: > Cough. You are missing a boat load of crucial fixes. There is a damned > good reason why 3.0.stable got 12 updates and the -rt version 14. I don't doubt there are. But we've only experienced one problem between 3.0.36-rt58 and

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-01 Thread Peter LaDow
On Thu, Nov 1, 2012 at 2:26 PM, Thomas Gleixner t...@linutronix.de wrote: Cough. You are missing a boat load of crucial fixes. There is a damned good reason why 3.0.stable got 12 updates and the -rt version 14. I don't doubt there are. But we've only experienced one problem between 3.0.36-rt58

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-01 Thread Peter LaDow
. Reported-by: Peter LaDow pet...@gocougs.wsu.edu Signed-off-by: Thomas Gleixner t...@linutronix.de diff --git a/include/linux/locallock.h b/include/linux/locallock.h index f1804a3..a5eea5d 100644 diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-01 Thread Peter LaDow
git log v3.0.36-rt58..3.0.48-rt72 That's what a source version control system is designed for AFAICT Thanks for the tip. I (naively) presumed there were published changelogs and was looking for them. Nor did I know the git logs were limited to releases, and didn't look there because I feared

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-10-31 Thread Peter LaDow
On Tue, Oct 30, 2012 at 5:33 PM, Steven Rostedt wrote: > From: Thomas Gleixner > > The netfilter code relies only on the implicit semantics of > local_bh_disable() for serializing wt_write_recseq sections. RT breaks > that and needs explicit serialization here. > > Rep

Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-10-31 Thread Peter LaDow
. Reported-by: Peter LaDow pet...@gocougs.wsu.edu Signed-off-by: Thomas Gleixner t...@linutronix.de diff --git a/include/linux/locallock.h b/include/linux/locallock.h index f1804a3..a5eea5d 100644 diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index

Re: Process Hang in __read_seqcount_begin

2012-10-30 Thread Peter LaDow
Ok. More of an update. We've managed to create a scenario that exhibits the problem much earlier. We can now cause the lockup to occur within a few hours (rather than the 12 to 24 hours in our other scenario). Our setup is to to have a a lot of traffic constantly being processed by the

Re: Process Hang in __read_seqcount_begin

2012-10-30 Thread Peter LaDow
Ok. More of an update. We've managed to create a scenario that exhibits the problem much earlier. We can now cause the lockup to occur within a few hours (rather than the 12 to 24 hours in our other scenario). Our setup is to to have a a lot of traffic constantly being processed by the

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow
On Fri, Oct 26, 2012 at 2:05 PM, Eric Dumazet wrote: > Do you know what is per cpu data in linux kernel ? I sorta did. But since your response, I did more reading, and now I see what you mean. But I don't think this is a per cpu issue. More below. > Because its not needed. Really I dont know

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow
(I've added netfilter and linux-rt-users to try to pull in more help). On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet wrote: > Upstream kernel is fine, there is no race, as long as : > > local_bh_disable() disables BH and preemption. Looking at the unpatched code in

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet wrote: > Could you try following patch ? So, I applied your patch. And so far, it seems to have fixed the issue. I've had my systems running for 48 hours, and no lockup in iptables. Usually, I could get a lockup to occur within 12 to 24 hours, and

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet eric.duma...@gmail.com wrote: Could you try following patch ? So, I applied your patch. And so far, it seems to have fixed the issue. I've had my systems running for 48 hours, and no lockup in iptables. Usually, I could get a lockup to occur

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow
(I've added netfilter and linux-rt-users to try to pull in more help). On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet eric.duma...@gmail.com wrote: Upstream kernel is fine, there is no race, as long as : local_bh_disable() disables BH and preemption. Looking at the unpatched code in

Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow
On Fri, Oct 26, 2012 at 2:05 PM, Eric Dumazet eric.duma...@gmail.com wrote: Do you know what is per cpu data in linux kernel ? I sorta did. But since your response, I did more reading, and now I see what you mean. But I don't think this is a per cpu issue. More below. Because its not

Re: Process Hang in __read_seqcount_begin

2012-10-24 Thread Peter LaDow
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet wrote: > Could you try following patch ? Thanks for the suggestion. But I have a question about the patch below. > + /* Note : cmpxchg() is a memory barrier, we dont need smp_wmb() */ > + if (old != new && cmpxchg(>sequence, old, new) ==

Re: Process Hang in __read_seqcount_begin

2012-10-24 Thread Peter LaDow
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet eric.duma...@gmail.com wrote: Could you try following patch ? Thanks for the suggestion. But I have a question about the patch below. + /* Note : cmpxchg() is a memory barrier, we dont need smp_wmb() */ + if (old != new

Re: Process Hang in __read_seqcount_begin

2012-10-23 Thread Peter LaDow
(Sorry for the subject change, but I wanted to try and pull in those who work on RT issues, and the subject didn't make that obvious. Please search for the same subject without the RT Linux trailing text.) Well, more information. Even with SMP enabled (and presumably the migrate_enable having

Re: Process Hang in __read_seqcount_begin

2012-10-23 Thread Peter LaDow
(Sorry for the subject change, but I wanted to try and pull in those who work on RT issues, and the subject didn't make that obvious. Please search for the same subject without the RT Linux trailing text.) Well, more information. Even with SMP enabled (and presumably the migrate_enable having

Re: Process Hang in __read_seqcount_begin

2012-10-22 Thread Peter LaDow
> Now, is preemption required to be disabled in non-SMP systems? I did more digging, and I found this. In linux/netfilter/x_tables.h, there is the definition of xt_write_recseq_begin. This function updates the sequence number for the sequence locks. This is called in the iptables kernel code.

Re: Process Hang in __read_seqcount_begin

2012-10-22 Thread Peter LaDow
On Mon, Oct 22, 2012 at 10:01 AM, Eric Dumazet wrote: > This looks like a corruption of s->sequence, and is value is odd, even > if no writer is alive. > > Does local_bh_disable() disables preemption on RT ? Hmmm With PREEMPT_RT_FULL defined (as we have): void local_bh_disable(void) {

Process Hang in __read_seqcount_begin

2012-10-22 Thread Peter LaDow
I posted this problem some time back on the linux-rt-users and netfilter lists. Since then, we thought we had a workaround to avoid this problem, so we dropped the issue. But now 5 months later, the problem has reappeared. And this time it is much more serious and much more difficult to

Process Hang in __read_seqcount_begin

2012-10-22 Thread Peter LaDow
I posted this problem some time back on the linux-rt-users and netfilter lists. Since then, we thought we had a workaround to avoid this problem, so we dropped the issue. But now 5 months later, the problem has reappeared. And this time it is much more serious and much more difficult to

Re: Process Hang in __read_seqcount_begin

2012-10-22 Thread Peter LaDow
On Mon, Oct 22, 2012 at 10:01 AM, Eric Dumazet eric.duma...@gmail.com wrote: This looks like a corruption of s-sequence, and is value is odd, even if no writer is alive. Does local_bh_disable() disables preemption on RT ? Hmmm With PREEMPT_RT_FULL defined (as we have): void

Re: Process Hang in __read_seqcount_begin

2012-10-22 Thread Peter LaDow
Now, is preemption required to be disabled in non-SMP systems? I did more digging, and I found this. In linux/netfilter/x_tables.h, there is the definition of xt_write_recseq_begin. This function updates the sequence number for the sequence locks. This is called in the iptables kernel code.