On Tue, Apr 24, 2001 at 06:56:32PM +0200, Christian Ehrhardt wrote:
> On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote:
> > ptrace only operates on processes that are stopped. So there are no
> > locking issues - we've synchronized on a much higher level than a
> > spinlock or
> Alan Cox <[EMAIL PROTECTED]> writes:
> > The preferable one for performance is certainly to backport the 2.4 changes
>
> Is it any more substantial than changing all uses of the ptrace flags
> to the new variable?
It affects asm blocks and offsets on some ports. Its not too bad tho
-
To
Alan Cox <[EMAIL PROTECTED]> writes:
> The preferable one for performance is certainly to backport the 2.4 changes
Is it any more substantial than changing all uses of the ptrace flags
to the new variable?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of
> > child->flags |= PF_PTRACED;
> >
> > without waiting for the child to have stopped.
>
> I can see how this could case PF_USEDFPU to be cleared inadvertently,
> but I do not have any ideas for testing this. Is it clear that this
> is the source of the problem?
There is no
Linus Torvalds writes:
> Ahh.. This actually _does_ look like a race on "current->flags":
> PTRACE_ATTACH will do a
>
> child->flags |= PF_PTRACED;
>
> without waiting for the child to have stopped.
I can see how this could case PF_USEDFPU to be cleared inadvertently,
but I do not
"Christian Ehrhardt" <[EMAIL PROTECTED]> writes:
> Victor: Could you try to reproduce the system wide corruption if you
> add an explicit call to stts(); at the very end of __switch_to?
> This should prevent the FPU corruption from spreading.
After adding this call, I cannot reproduce the global
On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote:
> ptrace only operates on processes that are stopped. So there are no
> locking issues - we've synchronized on a much higher level than a
> spinlock or semaphore.
This is only true for requests other than PTRACE_ATTACH and
On Tue, Apr 24, 2001 at 08:05:15AM -0500, Victor Zandy wrote:
>
> He found that PF_USEDFPU was always set before the machine was broken.
> After he found that it was set about 70% of the time.
If I'm not mistaken this actully can cause GLOBAL FPU corruption.
Here's why:
Assyme for a moment
[ Alan, I'm lazy and only have 2.2.14 sources on-line. Maybe this has
been fixed already and there's something else going on. Worth a look ]
In article <[EMAIL PROTECTED]>,
Victor Zandy <[EMAIL PROTECTED]> wrote:
>
>Someone else here traced the process flags of a FP-intensive program
>on a
> >1.) If I'm not mistaken switch_to changes current->flags without
> >atomic operations and without any locks and sys_ptrace changes
> >child->flags only protected by the big kernel lock.
>
> ptrace only operates on processes that are stopped. So there are no
> locking issues - we've
In article <[EMAIL PROTECTED]>,
Christian Ehrhardt <[EMAIL PROTECTED]> wrote:
>
>1.) If I'm not mistaken switch_to changes current->flags without
>atomic operations and without any locks and sys_ptrace changes
>child->flags only protected by the big kernel lock.
ptrace only operates on processes
Someone else here traced the process flags of a FP-intensive program
on a machine before and after it is put in the faulty FPU state. He
periodically sampled /proc/pid/stat while the program was running.
He found that PF_USEDFPU was always set before the machine was broken.
After he found that
ate with fnsave i.e. current->tss.i387 is
'invalid' after
fnsave current->tss.i387
fwait;
Thanks
Amol
David Konerding <[EMAIL PROTECTED]> on 04/23/2001 01:09:27 AM
To: Ulrich Drepper <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol La
fnsave i.e. current->tss.i387 is
'invalid' after
fnsave current->tss.i387
fwait;
Thanks
Amol
David Konerding <[EMAIL PROTECTED]> on 04/23/2001 01:09:27 AM
To: Ulrich Drepper <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol La
-tss.i387 is
'invalid' after
fnsave current-tss.i387
fwait;
Thanks
Amol
David Konerding [EMAIL PROTECTED] on 04/23/2001 01:09:27 AM
To: Ulrich Drepper [EMAIL PROTECTED]
cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol Lad/HSS)
Subject: Re: BUG: Global FPU corruption in 2.2
-tss.i387 is
'invalid' after
fnsave current-tss.i387
fwait;
Thanks
Amol
David Konerding [EMAIL PROTECTED] on 04/23/2001 01:09:27 AM
To: Ulrich Drepper [EMAIL PROTECTED]
cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol Lad/HSS)
Subject: Re: BUG: Global FPU corruption in 2.2
Someone else here traced the process flags of a FP-intensive program
on a machine before and after it is put in the faulty FPU state. He
periodically sampled /proc/pid/stat while the program was running.
He found that PF_USEDFPU was always set before the machine was broken.
After he found that
In article [EMAIL PROTECTED],
Christian Ehrhardt [EMAIL PROTECTED] wrote:
1.) If I'm not mistaken switch_to changes current-flags without
atomic operations and without any locks and sys_ptrace changes
child-flags only protected by the big kernel lock.
ptrace only operates on processes that are
1.) If I'm not mistaken switch_to changes current-flags without
atomic operations and without any locks and sys_ptrace changes
child-flags only protected by the big kernel lock.
ptrace only operates on processes that are stopped. So there are no
locking issues - we've synchronized on a
[ Alan, I'm lazy and only have 2.2.14 sources on-line. Maybe this has
been fixed already and there's something else going on. Worth a look ]
In article [EMAIL PROTECTED],
Victor Zandy [EMAIL PROTECTED] wrote:
Someone else here traced the process flags of a FP-intensive program
on a machine
On Tue, Apr 24, 2001 at 08:05:15AM -0500, Victor Zandy wrote:
He found that PF_USEDFPU was always set before the machine was broken.
After he found that it was set about 70% of the time.
If I'm not mistaken this actully can cause GLOBAL FPU corruption.
Here's why:
Assyme for a moment that
On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote:
ptrace only operates on processes that are stopped. So there are no
locking issues - we've synchronized on a much higher level than a
spinlock or semaphore.
This is only true for requests other than PTRACE_ATTACH and
Christian Ehrhardt [EMAIL PROTECTED] writes:
Victor: Could you try to reproduce the system wide corruption if you
add an explicit call to stts(); at the very end of __switch_to?
This should prevent the FPU corruption from spreading.
After adding this call, I cannot reproduce the global
child-flags |= PF_PTRACED;
without waiting for the child to have stopped.
I can see how this could case PF_USEDFPU to be cleared inadvertently,
but I do not have any ideas for testing this. Is it clear that this
is the source of the problem?
There is no guarantee that |=
Alan Cox [EMAIL PROTECTED] writes:
The preferable one for performance is certainly to backport the 2.4 changes
Is it any more substantial than changing all uses of the ptrace flags
to the new variable?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a
On Tue, Apr 24, 2001 at 06:56:32PM +0200, Christian Ehrhardt wrote:
On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote:
ptrace only operates on processes that are stopped. So there are no
locking issues - we've synchronized on a much higher level than a
spinlock or semaphore.
Erik Paulson <[EMAIL PROTECTED]> on 04/24/2001 01:14:27 AM
To: Christian Ehrhardt <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol Lad/HSS)
Subject: Re: BUG: Global FPU corruption in 2.2
On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wro
On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wrote:
> On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
> >
> > We have found that one of our programs can cause system-wide
> > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we
> > run this program, the FPU
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
>
> We have found that one of our programs can cause system-wide
> corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we
> run this program, the FPU gives bad results to all subsequent
> processes.
A few comments, not
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
We have found that one of our programs can cause system-wide
corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we
run this program, the FPU gives bad results to all subsequent
processes.
A few comments, not sure
On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wrote:
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
We have found that one of our programs can cause system-wide
corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we
run this program, the FPU gives bad
Erik Paulson [EMAIL PROTECTED] on 04/24/2001 01:14:27 AM
To: Christian Ehrhardt [EMAIL PROTECTED]
cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol Lad/HSS)
Subject: Re: BUG: Global FPU corruption in 2.2
On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wrote:
On Thu, Apr 19
Hello,
Linux 2.2.19 SMP, confirm report. Even games are going weird after
running this test, (my wife is complaining :-))
Have to reboot.
Kees
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at
> OK, regardless of how the linux kernel actually manages the FPU for user-space
>
> programs, does anybody have any comments on the original bugreport?
Complete mystification.
> >of pi begins to look wrong. Then kill everything and run pi by itself
> >again. It will no longer produce good
Ulrich Drepper wrote:
> "Richard B. Johnson" <[EMAIL PROTECTED]> writes:
>
> > The kernel doesn't know if a process is going to use the FPU when
> > a new process is created. Only the user's code, i.e., the 'C' runtime
> > library knows.
>
> Maybe you should try to understand the kernel code and
Ulrich Drepper wrote:
"Richard B. Johnson" [EMAIL PROTECTED] writes:
The kernel doesn't know if a process is going to use the FPU when
a new process is created. Only the user's code, i.e., the 'C' runtime
library knows.
Maybe you should try to understand the kernel code and the
OK, regardless of how the linux kernel actually manages the FPU for user-space
programs, does anybody have any comments on the original bugreport?
Complete mystification.
of pi begins to look wrong. Then kill everything and run pi by itself
again. It will no longer produce good results.
Hello,
Linux 2.2.19 SMP, confirm report. Even games are going weird after
running this test, (my wife is complaining :-))
Have to reboot.
Kees
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at
"Richard B. Johnson" <[EMAIL PROTECTED]> writes:
> The kernel doesn't know if a process is going to use the FPU when
> a new process is created. Only the user's code, i.e., the 'C' runtime
> library knows.
Maybe you should try to understand the kernel code and the features of
the processor
It looks to me like the kernel sets a trap for FP operations when a
process is switched in. Then when the process executes an FP op, the
kernel clears the trap and either loads the FP context or initializes
it, depending on whether it is the process' first FP operation. So no
help is need from
On 20 Apr 2001, Victor Zandy wrote:
>
> No dice. Your program does not fix the problem.
>
> If it were a hardware problem, I would expect the problem to occur
> under 2.4.2 as well as 2.2.*, and I would be surprised that we can
> consistently produce the behavior across our 64 node cluster.
On 20 Apr 2001, Ulrich Drepper wrote:
> "Richard B. Johnson" <[EMAIL PROTECTED]> writes:
>
> > If it "fixes" it, there is no problem with the FPU, but with the
> > 'C' runtime library which doesn't initialize the FPU to a known
> > state before it uses it.
>
> It's the kernel which initializes
"Richard B. Johnson" <[EMAIL PROTECTED]> writes:
> If it "fixes" it, there is no problem with the FPU, but with the
> 'C' runtime library which doesn't initialize the FPU to a known
> state before it uses it.
It's the kernel which initializes the FPU. This was always the case
and necessary to
No dice. Your program does not fix the problem.
If it were a hardware problem, I would expect the problem to occur
under 2.4.2 as well as 2.2.*, and I would be surprised that we can
consistently produce the behavior across our 64 node cluster. But we
are keeping the possibility in mind.
On 20 Apr 2001, Victor Zandy wrote:
>
> Victor Zandy <[EMAIL PROTECTED]> writes:
> > We have found that one of our programs can cause system-wide
> > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we
> > run this program, the FPU gives bad results to all subsequent
> >
Victor Zandy <[EMAIL PROTECTED]> writes:
> We have found that one of our programs can cause system-wide
> corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we
> run this program, the FPU gives bad results to all subsequent
> processes.
We have now tested 2.4.2 and 2.2.19.
Victor Zandy [EMAIL PROTECTED] writes:
We have found that one of our programs can cause system-wide
corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we
run this program, the FPU gives bad results to all subsequent
processes.
We have now tested 2.4.2 and 2.2.19.
2.2.19 has
No dice. Your program does not fix the problem.
If it were a hardware problem, I would expect the problem to occur
under 2.4.2 as well as 2.2.*, and I would be surprised that we can
consistently produce the behavior across our 64 node cluster. But we
are keeping the possibility in mind.
"Richard B. Johnson" [EMAIL PROTECTED] writes:
If it "fixes" it, there is no problem with the FPU, but with the
'C' runtime library which doesn't initialize the FPU to a known
state before it uses it.
It's the kernel which initializes the FPU. This was always the case
and necessary to
On 20 Apr 2001, Ulrich Drepper wrote:
"Richard B. Johnson" [EMAIL PROTECTED] writes:
If it "fixes" it, there is no problem with the FPU, but with the
'C' runtime library which doesn't initialize the FPU to a known
state before it uses it.
It's the kernel which initializes the FPU.
On 20 Apr 2001, Victor Zandy wrote:
No dice. Your program does not fix the problem.
If it were a hardware problem, I would expect the problem to occur
under 2.4.2 as well as 2.2.*, and I would be surprised that we can
consistently produce the behavior across our 64 node cluster. But we
It looks to me like the kernel sets a trap for FP operations when a
process is switched in. Then when the process executes an FP op, the
kernel clears the trap and either loads the FP context or initializes
it, depending on whether it is the process' first FP operation. So no
help is need from
"Richard B. Johnson" [EMAIL PROTECTED] writes:
The kernel doesn't know if a process is going to use the FPU when
a new process is created. Only the user's code, i.e., the 'C' runtime
library knows.
Maybe you should try to understand the kernel code and the features of
the processor first.
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
>
> We have found that one of our programs can cause system-wide
> corruption of the x86 FPU under 2.2.16 and 2.2.17.
>
> We see this problem on dual 550MHz Xeons with 1GB RAM.
Hm, I started to wonder if this is not somewhat
We have found that one of our programs can cause system-wide
corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we
run this program, the FPU gives bad results to all subsequent
processes.
We see this problem on dual 550MHz Xeons with 1GB RAM. We have 64 of
these things, and we
We have found that one of our programs can cause system-wide
corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we
run this program, the FPU gives bad results to all subsequent
processes.
We see this problem on dual 550MHz Xeons with 1GB RAM. We have 64 of
these things, and we
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote:
We have found that one of our programs can cause system-wide
corruption of the x86 FPU under 2.2.16 and 2.2.17.
We see this problem on dual 550MHz Xeons with 1GB RAM.
Hm, I started to wonder if this is not somewhat related
57 matches
Mail list logo