On a separate note, I want to thank everyone that helped with this
issue, especially Eric and Thomas, to and Steven and Thomas for
schooling me on the changelog extraction. This problem was a big one
for us that we were struggling to understand. All the help is greatly
appreciated.
Thanks,
Pete
On Thu, Nov 1, 2012 at 5:36 PM, Peter LaDow wrote:
> I'm have a setup running 3.0.48-rt72. It's been running about 8 hours
> so far, and tomorrow I'll know if there's been any problems. I'm
> confident things will be fine tomorrow, and at that time I'll be glad
> to attach a Tested
On Thu, Nov 1, 2012 at 5:36 PM, Peter LaDow pet...@gocougs.wsu.edu wrote:
I'm have a setup running 3.0.48-rt72. It's been running about 8 hours
so far, and tomorrow I'll know if there's been any problems. I'm
confident things will be fine tomorrow, and at that time I'll be glad
to attach
On a separate note, I want to thank everyone that helped with this
issue, especially Eric and Thomas, to and Steven and Thomas for
schooling me on the changelog extraction. This problem was a big one
for us that we were struggling to understand. All the help is greatly
appreciated.
Thanks,
Pete
> git log v3.0.36-rt58..3.0.48-rt72
>
> That's what a source version control system is designed for AFAICT
Thanks for the tip. I (naively) presumed there were published
changelogs and was looking for them. Nor did I know the git logs were
limited to releases, and didn't look there because I
On Tue, Oct 30, 2012 at 5:33 PM, Steven Rostedt wrote:
> From: Thomas Gleixner
>
> The netfilter code relies only on the implicit semantics of
> local_bh_disable() for serializing wt_write_recseq sections. RT breaks
> that and needs explicit serialization here.
>
> Rep
On Thu, Nov 1, 2012 at 2:26 PM, Thomas Gleixner wrote:
> Cough. You are missing a boat load of crucial fixes. There is a damned
> good reason why 3.0.stable got 12 updates and the -rt version 14.
I don't doubt there are. But we've only experienced one problem
between 3.0.36-rt58 and
On Thu, Nov 1, 2012 at 2:26 PM, Thomas Gleixner t...@linutronix.de wrote:
Cough. You are missing a boat load of crucial fixes. There is a damned
good reason why 3.0.stable got 12 updates and the -rt version 14.
I don't doubt there are. But we've only experienced one problem
between 3.0.36-rt58
.
Reported-by: Peter LaDow pet...@gocougs.wsu.edu
Signed-off-by: Thomas Gleixner t...@linutronix.de
diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index f1804a3..a5eea5d 100644
diff --git a/include/linux/netfilter/x_tables.h
b/include/linux/netfilter/x_tables.h
index
git log v3.0.36-rt58..3.0.48-rt72
That's what a source version control system is designed for AFAICT
Thanks for the tip. I (naively) presumed there were published
changelogs and was looking for them. Nor did I know the git logs were
limited to releases, and didn't look there because I feared
On Tue, Oct 30, 2012 at 5:33 PM, Steven Rostedt wrote:
> From: Thomas Gleixner
>
> The netfilter code relies only on the implicit semantics of
> local_bh_disable() for serializing wt_write_recseq sections. RT breaks
> that and needs explicit serialization here.
>
> Rep
.
Reported-by: Peter LaDow pet...@gocougs.wsu.edu
Signed-off-by: Thomas Gleixner t...@linutronix.de
diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index f1804a3..a5eea5d 100644
diff --git a/include/linux/netfilter/x_tables.h
b/include/linux/netfilter/x_tables.h
index
Ok. More of an update. We've managed to create a scenario that
exhibits the problem much earlier. We can now cause the lockup to
occur within a few hours (rather than the 12 to 24 hours in our other
scenario).
Our setup is to to have a a lot of traffic constantly being processed
by the
Ok. More of an update. We've managed to create a scenario that
exhibits the problem much earlier. We can now cause the lockup to
occur within a few hours (rather than the 12 to 24 hours in our other
scenario).
Our setup is to to have a a lot of traffic constantly being processed
by the
On Fri, Oct 26, 2012 at 2:05 PM, Eric Dumazet wrote:
> Do you know what is per cpu data in linux kernel ?
I sorta did. But since your response, I did more reading, and now I
see what you mean. But I don't think this is a per cpu issue. More
below.
> Because its not needed. Really I dont know
(I've added netfilter and linux-rt-users to try to pull in more help).
On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet wrote:
> Upstream kernel is fine, there is no race, as long as :
>
> local_bh_disable() disables BH and preemption.
Looking at the unpatched code in
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet wrote:
> Could you try following patch ?
So, I applied your patch. And so far, it seems to have fixed the
issue. I've had my systems running for 48 hours, and no lockup in
iptables. Usually, I could get a lockup to occur within 12 to 24
hours, and
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet eric.duma...@gmail.com wrote:
Could you try following patch ?
So, I applied your patch. And so far, it seems to have fixed the
issue. I've had my systems running for 48 hours, and no lockup in
iptables. Usually, I could get a lockup to occur
(I've added netfilter and linux-rt-users to try to pull in more help).
On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet eric.duma...@gmail.com wrote:
Upstream kernel is fine, there is no race, as long as :
local_bh_disable() disables BH and preemption.
Looking at the unpatched code in
On Fri, Oct 26, 2012 at 2:05 PM, Eric Dumazet eric.duma...@gmail.com wrote:
Do you know what is per cpu data in linux kernel ?
I sorta did. But since your response, I did more reading, and now I
see what you mean. But I don't think this is a per cpu issue. More
below.
Because its not
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet wrote:
> Could you try following patch ?
Thanks for the suggestion. But I have a question about the patch below.
> + /* Note : cmpxchg() is a memory barrier, we dont need smp_wmb() */
> + if (old != new && cmpxchg(>sequence, old, new) ==
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet eric.duma...@gmail.com wrote:
Could you try following patch ?
Thanks for the suggestion. But I have a question about the patch below.
+ /* Note : cmpxchg() is a memory barrier, we dont need smp_wmb() */
+ if (old != new
(Sorry for the subject change, but I wanted to try and pull in those
who work on RT issues, and the subject didn't make that obvious.
Please search for the same subject without the RT Linux trailing
text.)
Well, more information. Even with SMP enabled (and presumably the
migrate_enable having
(Sorry for the subject change, but I wanted to try and pull in those
who work on RT issues, and the subject didn't make that obvious.
Please search for the same subject without the RT Linux trailing
text.)
Well, more information. Even with SMP enabled (and presumably the
migrate_enable having
> Now, is preemption required to be disabled in non-SMP systems?
I did more digging, and I found this.
In linux/netfilter/x_tables.h, there is the definition of
xt_write_recseq_begin. This function updates the sequence number for
the sequence locks. This is called in the iptables kernel code.
On Mon, Oct 22, 2012 at 10:01 AM, Eric Dumazet wrote:
> This looks like a corruption of s->sequence, and is value is odd, even
> if no writer is alive.
>
> Does local_bh_disable() disables preemption on RT ?
Hmmm
With PREEMPT_RT_FULL defined (as we have):
void local_bh_disable(void)
{
I posted this problem some time back on the linux-rt-users and
netfilter lists. Since then, we thought we had a workaround to avoid
this problem, so we dropped the issue. But now 5 months later, the
problem has reappeared. And this time it is much more serious and
much more difficult to
I posted this problem some time back on the linux-rt-users and
netfilter lists. Since then, we thought we had a workaround to avoid
this problem, so we dropped the issue. But now 5 months later, the
problem has reappeared. And this time it is much more serious and
much more difficult to
On Mon, Oct 22, 2012 at 10:01 AM, Eric Dumazet eric.duma...@gmail.com wrote:
This looks like a corruption of s-sequence, and is value is odd, even
if no writer is alive.
Does local_bh_disable() disables preemption on RT ?
Hmmm
With PREEMPT_RT_FULL defined (as we have):
void
Now, is preemption required to be disabled in non-SMP systems?
I did more digging, and I found this.
In linux/netfilter/x_tables.h, there is the definition of
xt_write_recseq_begin. This function updates the sequence number for
the sequence locks. This is called in the iptables kernel code.
30 matches
Mail list logo