Re: what causes Machine Check exception? revisited (2.2.18)

2001-05-08 Thread Mike Fedyk
On Mon, May 07, 2001 at 11:57:17AM +0100, Alan Cox wrote: > Generally it indicates a CPU problem but I've see it caused by overclocking > and poorly fitted heatsinks I've been able to trigger a Machine check error on PPC when trying to boot directly from OF with a COFF kernel. The system has work

RE: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Simon Richter
On Mon, 7 May 2001, Dan Hollis wrote: > > Erm, it was bad RAM everytime it happened to me. On standard PCs, you > > don't see those because you don't have ECC and the error is simply not > > detected. > So a 440bx motherboard with ECC ram is a non-standard PC? I bet the board doesn't force you

RE: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread nick
Yep, totally. I've worked on hundreds of systems and less than 20 of the workstations or PCs have been useing ECC. Most servers do, but not even all of them. Nick On Mon, 7 May 2001, Dan Hollis wrote: > On Mon, 7 May 2001, Simon Richter wrote: > > On Mon, 7 May 2001, Bene, Martin wrote

RE: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Dan Hollis
On Mon, 7 May 2001, Simon Richter wrote: > On Mon, 7 May 2001, Bene, Martin wrote: > > Definitely not caused by: > > Bad Rams, mb-chipset. > Erm, it was bad RAM everytime it happened to me. On standard PCs, you > don't see those because you don't have ECC and the error is simply not > detected

RE: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Simon Richter
On Mon, 7 May 2001, Bene, Martin wrote: [MCE caused by bad RAM] > I don't think there is a way a machine check exception can be triggered by > software - which it would have to be in order to be caused by bad RAMs. A MCE is triggered by an ECC error - no software involved. A good trap handler w

RE: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Ricardo Galli
>> Definitely not caused by: >> Bad Rams, mb-chipset. > > Erm, it was bad RAM everytime it happened to me. On standard PCs, you > don't see those because you don't have ECC and the error is simply not > detected. I did have the same problem with an SMP Intel 440LX which run without any problem si

Re: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Alan Cox
> You get SIG11 errors when running programs(kernel compile seems to be agood > example), you get crashing processes, you get all sorts of weird funnies but > you really shouldn't get machine check exceptions. > > I don't think there is a way a machine check exception can be triggered by > softwa

RE: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Bene, Martin
Hi Simon, > On Mon, 7 May 2001, Bene, Martin wrote: > > > Definitely not caused by: > > Bad Rams, mb-chipset. > > Erm, it was bad RAM everytime it happened to me. On standard PCs, you > don't see those because you don't have ECC and the error is simply not > detected. Strange - definitely,

RE: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Simon Richter
On Mon, 7 May 2001, Bene, Martin wrote: > Definitely not caused by: > Bad Rams, mb-chipset. Erm, it was bad RAM everytime it happened to me. On standard PCs, you don't see those because you don't have ECC and the error is simply not detected. Simon -- GPG public key available from ht

Re: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Alan Cox
> After searching the archives of the list I found some similar reports > from September and December 2000 but as far as I understood the cause of > the error was blamed on the CPU. Is this the most probable case? A machine check (trap 18) is signalled by the processor when it thinks it is in an

RE: what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Bene, Martin
Hi Juhan, > After searching the archives of the list I found some similar reports > from September and December 2000 but as far as I understood > the cause of > the error was blamed on the CPU. Is this the most probable case? > > Best regards, > > Juhan Ernits > > -- /var/log/kern.log

what causes Machine Check exception? revisited (2.2.18)

2001-05-07 Thread Juhan-Peep Ernits
Hello! After searching the archives of the list I found some similar reports from September and December 2000 but as far as I understood the cause of the error was blamed on the CPU. Is this the most probable case? Best regards, Juhan Ernits -- /var/log/kern.log May 6 06:47:25 mark

Re: what causes Machine Check exception?

2000-09-27 Thread Maciej W. Rozycki
On Wed, 27 Sep 2000, Arjan van de Ven wrote: > "This means that your CPU indicates that it is defective" An MCE might also mean an external error. For example, the Mercury/Neptune chipsets used to report MB memory and PCI parity errors via the BUSCHK# CPU input, which in turn triggers an MCE i

Re: what causes Machine Check exception?

2000-09-26 Thread Arjan van de Ven
In article <[EMAIL PROTECTED]> you wrote: > Alan, > I think adding a document about MCE in the kernel would be very useful. > Or at least a pointer to Intel's documentation about it. and something like this: "This means that your CPU indicates that it is defective" as a printk Greeti

Re: what causes Machine Check exception?

2000-09-26 Thread Alan Cox
> I think adding a document about MCE in the kernel would be very useful. > Or at least a pointer to Intel's documentation about it. Agreed - and maybe an MCE decoder app I've been peering over the docs to retrieve POST log entries as well. It means running some bios calls in vm86 but might b

Re: what causes Machine Check exception?

2000-09-26 Thread Marcelo Tosatti
Alan, I think adding a document about MCE in the kernel would be very useful. Or at least a pointer to Intel's documentation about it. On 26 Sep 2000, H. Peter Anvin wrote: > Followup to: <[EMAIL PROTECTED]> > By author:"Martin Bene" <[EMAIL PROTECTED]> > In newsgroup: linux.dev.kerne

Re: AW: what causes Machine Check exception?

2000-09-26 Thread Alan Cox
> So memory problems seem to be out. (kernel did not panic on machine check > exception, so machine is still up after reported exception) MCE is a cpu level check. It generally indicates a processor fault. The actual code you get you can decode with the intel PII manual - To unsubscribe from thi

AW: what causes Machine Check exception?

2000-09-26 Thread Martin Bene
Hi Albert, > See the Signal 11 FAQ... you can start with: > > 1. make sure the memory in bank 4 is properly seated > 2. make sure your case is cool enough inside > 3. make sure your power supply is good > 4. try new RAM > 5. try a new motherboard Memory: ECC Mem in all banks, status: [root@delp

Re: what causes Machine Check exception?

2000-09-26 Thread H. Peter Anvin
Followup to: <[EMAIL PROTECTED]> By author:"Martin Bene" <[EMAIL PROTECTED]> In newsgroup: linux.dev.kernel > > Hi, > > just found this in my logs, doesn't look like something I'd want to > see - anything I should do about it? > What causes machine check exceptions? Hardware error.

what causes Machine Check exception?

2000-09-26 Thread Martin Bene
Hi, just found this in my logs, doesn't look like something I'd want to see - anything I should do about it? System: Linux 2.2.17, UP-Kernel, Asus P3V133 Motherboard, PIII/800 coppermine (133 FSB), 512MB ECC SDRAM CPU 0: Machine Check Exception: 0004<0>Bank 4: b2040151genera