Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Michal Jaegermann
On Tue, Apr 24, 2001 at 06:56:32PM +0200, Christian Ehrhardt wrote: > On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote: > > ptrace only operates on processes that are stopped. So there are no > > locking issues - we've synchronized on a much higher level than a > > spinlock or

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Alan Cox
> Alan Cox <[EMAIL PROTECTED]> writes: > > The preferable one for performance is certainly to backport the 2.4 changes > > Is it any more substantial than changing all uses of the ptrace flags > to the new variable? It affects asm blocks and offsets on some ports. Its not too bad tho - To

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
Alan Cox <[EMAIL PROTECTED]> writes: > The preferable one for performance is certainly to backport the 2.4 changes Is it any more substantial than changing all uses of the ptrace flags to the new variable? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Alan Cox
> > child->flags |= PF_PTRACED; > > > > without waiting for the child to have stopped. > > I can see how this could case PF_USEDFPU to be cleared inadvertently, > but I do not have any ideas for testing this. Is it clear that this > is the source of the problem? There is no

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
Linus Torvalds writes: > Ahh.. This actually _does_ look like a race on "current->flags": > PTRACE_ATTACH will do a > > child->flags |= PF_PTRACED; > > without waiting for the child to have stopped. I can see how this could case PF_USEDFPU to be cleared inadvertently, but I do not

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
"Christian Ehrhardt" <[EMAIL PROTECTED]> writes: > Victor: Could you try to reproduce the system wide corruption if you > add an explicit call to stts(); at the very end of __switch_to? > This should prevent the FPU corruption from spreading. After adding this call, I cannot reproduce the global

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Christian Ehrhardt
On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote: > ptrace only operates on processes that are stopped. So there are no > locking issues - we've synchronized on a much higher level than a > spinlock or semaphore. This is only true for requests other than PTRACE_ATTACH and

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Christian Ehrhardt
On Tue, Apr 24, 2001 at 08:05:15AM -0500, Victor Zandy wrote: > > He found that PF_USEDFPU was always set before the machine was broken. > After he found that it was set about 70% of the time. If I'm not mistaken this actully can cause GLOBAL FPU corruption. Here's why: Assyme for a moment

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Linus Torvalds
[ Alan, I'm lazy and only have 2.2.14 sources on-line. Maybe this has been fixed already and there's something else going on. Worth a look ] In article <[EMAIL PROTECTED]>, Victor Zandy <[EMAIL PROTECTED]> wrote: > >Someone else here traced the process flags of a FP-intensive program >on a

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Alan Cox
> >1.) If I'm not mistaken switch_to changes current->flags without > >atomic operations and without any locks and sys_ptrace changes > >child->flags only protected by the big kernel lock. > > ptrace only operates on processes that are stopped. So there are no > locking issues - we've

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Linus Torvalds
In article <[EMAIL PROTECTED]>, Christian Ehrhardt <[EMAIL PROTECTED]> wrote: > >1.) If I'm not mistaken switch_to changes current->flags without >atomic operations and without any locks and sys_ptrace changes >child->flags only protected by the big kernel lock. ptrace only operates on processes

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
Someone else here traced the process flags of a FP-intensive program on a machine before and after it is put in the faulty FPU state. He periodically sampled /proc/pid/stat while the program was running. He found that PF_USEDFPU was always set before the machine was broken. After he found that

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread alad
ate with fnsave i.e. current->tss.i387 is 'invalid' after fnsave current->tss.i387 fwait; Thanks Amol David Konerding <[EMAIL PROTECTED]> on 04/23/2001 01:09:27 AM To: Ulrich Drepper <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol La

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread alad
fnsave i.e. current->tss.i387 is 'invalid' after fnsave current->tss.i387 fwait; Thanks Amol David Konerding <[EMAIL PROTECTED]> on 04/23/2001 01:09:27 AM To: Ulrich Drepper <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol La

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread alad
-tss.i387 is 'invalid' after fnsave current-tss.i387 fwait; Thanks Amol David Konerding [EMAIL PROTECTED] on 04/23/2001 01:09:27 AM To: Ulrich Drepper [EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol Lad/HSS) Subject: Re: BUG: Global FPU corruption in 2.2

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread alad
-tss.i387 is 'invalid' after fnsave current-tss.i387 fwait; Thanks Amol David Konerding [EMAIL PROTECTED] on 04/23/2001 01:09:27 AM To: Ulrich Drepper [EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol Lad/HSS) Subject: Re: BUG: Global FPU corruption in 2.2

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
Someone else here traced the process flags of a FP-intensive program on a machine before and after it is put in the faulty FPU state. He periodically sampled /proc/pid/stat while the program was running. He found that PF_USEDFPU was always set before the machine was broken. After he found that

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Linus Torvalds
In article [EMAIL PROTECTED], Christian Ehrhardt [EMAIL PROTECTED] wrote: 1.) If I'm not mistaken switch_to changes current-flags without atomic operations and without any locks and sys_ptrace changes child-flags only protected by the big kernel lock. ptrace only operates on processes that are

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Alan Cox
1.) If I'm not mistaken switch_to changes current-flags without atomic operations and without any locks and sys_ptrace changes child-flags only protected by the big kernel lock. ptrace only operates on processes that are stopped. So there are no locking issues - we've synchronized on a

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Linus Torvalds
[ Alan, I'm lazy and only have 2.2.14 sources on-line. Maybe this has been fixed already and there's something else going on. Worth a look ] In article [EMAIL PROTECTED], Victor Zandy [EMAIL PROTECTED] wrote: Someone else here traced the process flags of a FP-intensive program on a machine

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Christian Ehrhardt
On Tue, Apr 24, 2001 at 08:05:15AM -0500, Victor Zandy wrote: He found that PF_USEDFPU was always set before the machine was broken. After he found that it was set about 70% of the time. If I'm not mistaken this actully can cause GLOBAL FPU corruption. Here's why: Assyme for a moment that

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Christian Ehrhardt
On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote: ptrace only operates on processes that are stopped. So there are no locking issues - we've synchronized on a much higher level than a spinlock or semaphore. This is only true for requests other than PTRACE_ATTACH and

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
Christian Ehrhardt [EMAIL PROTECTED] writes: Victor: Could you try to reproduce the system wide corruption if you add an explicit call to stts(); at the very end of __switch_to? This should prevent the FPU corruption from spreading. After adding this call, I cannot reproduce the global

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Alan Cox
child-flags |= PF_PTRACED; without waiting for the child to have stopped. I can see how this could case PF_USEDFPU to be cleared inadvertently, but I do not have any ideas for testing this. Is it clear that this is the source of the problem? There is no guarantee that |=

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
Alan Cox [EMAIL PROTECTED] writes: The preferable one for performance is certainly to backport the 2.4 changes Is it any more substantial than changing all uses of the ptrace flags to the new variable? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Michal Jaegermann
On Tue, Apr 24, 2001 at 06:56:32PM +0200, Christian Ehrhardt wrote: On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote: ptrace only operates on processes that are stopped. So there are no locking issues - we've synchronized on a much higher level than a spinlock or semaphore.

Re: BUG: Global FPU corruption in 2.2

2001-04-23 Thread alad
Erik Paulson <[EMAIL PROTECTED]> on 04/24/2001 01:14:27 AM To: Christian Ehrhardt <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol Lad/HSS) Subject: Re: BUG: Global FPU corruption in 2.2 On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wro

Re: BUG: Global FPU corruption in 2.2

2001-04-23 Thread Erik Paulson
On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wrote: > On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote: > > > > We have found that one of our programs can cause system-wide > > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we > > run this program, the FPU

Re: BUG: Global FPU corruption in 2.2

2001-04-23 Thread Christian Ehrhardt
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote: > > We have found that one of our programs can cause system-wide > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we > run this program, the FPU gives bad results to all subsequent > processes. A few comments, not

Re: BUG: Global FPU corruption in 2.2

2001-04-23 Thread Christian Ehrhardt
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote: We have found that one of our programs can cause system-wide corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we run this program, the FPU gives bad results to all subsequent processes. A few comments, not sure

Re: BUG: Global FPU corruption in 2.2

2001-04-23 Thread Erik Paulson
On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wrote: On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote: We have found that one of our programs can cause system-wide corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we run this program, the FPU gives bad

Re: BUG: Global FPU corruption in 2.2

2001-04-23 Thread alad
Erik Paulson [EMAIL PROTECTED] on 04/24/2001 01:14:27 AM To: Christian Ehrhardt [EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol Lad/HSS) Subject: Re: BUG: Global FPU corruption in 2.2 On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wrote: On Thu, Apr 19

Re: BUG: Global FPU corruption in 2.2

2001-04-22 Thread kees
Hello, Linux 2.2.19 SMP, confirm report. Even games are going weird after running this test, (my wife is complaining :-)) Have to reboot. Kees - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: BUG: Global FPU corruption in 2.2

2001-04-22 Thread Alan Cox
> OK, regardless of how the linux kernel actually manages the FPU for user-space > > programs, does anybody have any comments on the original bugreport? Complete mystification. > >of pi begins to look wrong. Then kill everything and run pi by itself > >again. It will no longer produce good

Re: BUG: Global FPU corruption in 2.2

2001-04-22 Thread David Konerding
Ulrich Drepper wrote: > "Richard B. Johnson" <[EMAIL PROTECTED]> writes: > > > The kernel doesn't know if a process is going to use the FPU when > > a new process is created. Only the user's code, i.e., the 'C' runtime > > library knows. > > Maybe you should try to understand the kernel code and

Re: BUG: Global FPU corruption in 2.2

2001-04-22 Thread David Konerding
Ulrich Drepper wrote: "Richard B. Johnson" [EMAIL PROTECTED] writes: The kernel doesn't know if a process is going to use the FPU when a new process is created. Only the user's code, i.e., the 'C' runtime library knows. Maybe you should try to understand the kernel code and the

Re: BUG: Global FPU corruption in 2.2

2001-04-22 Thread Alan Cox
OK, regardless of how the linux kernel actually manages the FPU for user-space programs, does anybody have any comments on the original bugreport? Complete mystification. of pi begins to look wrong. Then kill everything and run pi by itself again. It will no longer produce good results.

Re: BUG: Global FPU corruption in 2.2

2001-04-22 Thread kees
Hello, Linux 2.2.19 SMP, confirm report. Even games are going weird after running this test, (my wife is complaining :-)) Have to reboot. Kees - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Ulrich Drepper
"Richard B. Johnson" <[EMAIL PROTECTED]> writes: > The kernel doesn't know if a process is going to use the FPU when > a new process is created. Only the user's code, i.e., the 'C' runtime > library knows. Maybe you should try to understand the kernel code and the features of the processor

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy
It looks to me like the kernel sets a trap for FP operations when a process is switched in. Then when the process executes an FP op, the kernel clears the trap and either loads the FP context or initializes it, depending on whether it is the process' first FP operation. So no help is need from

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Richard B. Johnson
On 20 Apr 2001, Victor Zandy wrote: > > No dice. Your program does not fix the problem. > > If it were a hardware problem, I would expect the problem to occur > under 2.4.2 as well as 2.2.*, and I would be surprised that we can > consistently produce the behavior across our 64 node cluster.

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Richard B. Johnson
On 20 Apr 2001, Ulrich Drepper wrote: > "Richard B. Johnson" <[EMAIL PROTECTED]> writes: > > > If it "fixes" it, there is no problem with the FPU, but with the > > 'C' runtime library which doesn't initialize the FPU to a known > > state before it uses it. > > It's the kernel which initializes

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Ulrich Drepper
"Richard B. Johnson" <[EMAIL PROTECTED]> writes: > If it "fixes" it, there is no problem with the FPU, but with the > 'C' runtime library which doesn't initialize the FPU to a known > state before it uses it. It's the kernel which initializes the FPU. This was always the case and necessary to

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy
No dice. Your program does not fix the problem. If it were a hardware problem, I would expect the problem to occur under 2.4.2 as well as 2.2.*, and I would be surprised that we can consistently produce the behavior across our 64 node cluster. But we are keeping the possibility in mind.

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Richard B. Johnson
On 20 Apr 2001, Victor Zandy wrote: > > Victor Zandy <[EMAIL PROTECTED]> writes: > > We have found that one of our programs can cause system-wide > > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we > > run this program, the FPU gives bad results to all subsequent > >

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy
Victor Zandy <[EMAIL PROTECTED]> writes: > We have found that one of our programs can cause system-wide > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we > run this program, the FPU gives bad results to all subsequent > processes. We have now tested 2.4.2 and 2.2.19.

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy
Victor Zandy [EMAIL PROTECTED] writes: We have found that one of our programs can cause system-wide corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we run this program, the FPU gives bad results to all subsequent processes. We have now tested 2.4.2 and 2.2.19. 2.2.19 has

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy
No dice. Your program does not fix the problem. If it were a hardware problem, I would expect the problem to occur under 2.4.2 as well as 2.2.*, and I would be surprised that we can consistently produce the behavior across our 64 node cluster. But we are keeping the possibility in mind.

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Ulrich Drepper
"Richard B. Johnson" [EMAIL PROTECTED] writes: If it "fixes" it, there is no problem with the FPU, but with the 'C' runtime library which doesn't initialize the FPU to a known state before it uses it. It's the kernel which initializes the FPU. This was always the case and necessary to

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Richard B. Johnson
On 20 Apr 2001, Ulrich Drepper wrote: "Richard B. Johnson" [EMAIL PROTECTED] writes: If it "fixes" it, there is no problem with the FPU, but with the 'C' runtime library which doesn't initialize the FPU to a known state before it uses it. It's the kernel which initializes the FPU.

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Richard B. Johnson
On 20 Apr 2001, Victor Zandy wrote: No dice. Your program does not fix the problem. If it were a hardware problem, I would expect the problem to occur under 2.4.2 as well as 2.2.*, and I would be surprised that we can consistently produce the behavior across our 64 node cluster. But we

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy
It looks to me like the kernel sets a trap for FP operations when a process is switched in. Then when the process executes an FP op, the kernel clears the trap and either loads the FP context or initializes it, depending on whether it is the process' first FP operation. So no help is need from

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Ulrich Drepper
"Richard B. Johnson" [EMAIL PROTECTED] writes: The kernel doesn't know if a process is going to use the FPU when a new process is created. Only the user's code, i.e., the 'C' runtime library knows. Maybe you should try to understand the kernel code and the features of the processor first.

Re: BUG: Global FPU corruption in 2.2

2001-04-19 Thread Michal Jaegermann
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote: > > We have found that one of our programs can cause system-wide > corruption of the x86 FPU under 2.2.16 and 2.2.17. > > We see this problem on dual 550MHz Xeons with 1GB RAM. Hm, I started to wonder if this is not somewhat

BUG: Global FPU corruption in 2.2

2001-04-19 Thread Victor Zandy
We have found that one of our programs can cause system-wide corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we run this program, the FPU gives bad results to all subsequent processes. We see this problem on dual 550MHz Xeons with 1GB RAM. We have 64 of these things, and we

BUG: Global FPU corruption in 2.2

2001-04-19 Thread Victor Zandy
We have found that one of our programs can cause system-wide corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we run this program, the FPU gives bad results to all subsequent processes. We see this problem on dual 550MHz Xeons with 1GB RAM. We have 64 of these things, and we

Re: BUG: Global FPU corruption in 2.2

2001-04-19 Thread Michal Jaegermann
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote: We have found that one of our programs can cause system-wide corruption of the x86 FPU under 2.2.16 and 2.2.17. We see this problem on dual 550MHz Xeons with 1GB RAM. Hm, I started to wonder if this is not somewhat related