Hello,
Am 14.11.2010 01:55, schrieb Philip Guenther:
On Sat, Nov 13, 2010 at 2:23 PM, V.T. Mueller
<v...@vorsprung-durch-denken.de> wrote:
After a while of searching I found a note in
http://www.openbsd.org/plus39.html that code for MCE/MCA was added on i386.
There, it also reads: "amd64 should get this next".
Was there work on this already? If so, could someone point me to the
relevant code/documentation, please?
amd64 appears to have had a interrupt gate for the machine check
exception from the start. There's no documentation that I know of;
whether i386 or amd64, you just get a panic "fatal machine check" when
it happens, with a dump of some of the bits from the trap frame.
Does that mean that non-critical events, say when the BIOS experienced
a correctable bitflip in main memory, don't get logged (read out)?
The code can be found in the sys/arch/{i386/i386,amd64/amd64}/
directories. machdep.c has the bits that set up the gate, the asm for
which is actually defined in either locore.s or vector.S, and then
trap.c has the high-level handler, which is just the default case in
trap().
Thank you Philip, I will go through the code again. I briefly went
through amd64/machdep.c yesterday and couldn't find what I'm looking for.
My understanding so far is that there are three types of MCA events in
the machines BIOS:
0 non-critical
1 fatal
2 correctable
And my interest is in type 2 events, and logging of these.
In >80% a memory module has ECC events days before it will cause the
system to halt. On hardware (supermicro et al.) that does not include
this functionality on its own (e. g. proliant does it in firmware) it
is crucial for system reliabilty.
Kind regards
vt
P.S.: I liked the "we" in "goto we_re_toast;" in trap.c :-)