hajhouse wrote: > Linux wotan 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 > GNU/Linux > > Try 'modprobe ecc'.
My research found: * Bluesmoke is now EDAC * The ecc.ko is part of the EDAC project * EDAC has been somewhat intel centric in the past * Main line kernels have EDAC and support intel chipsets * 2.6.17-10-generic does not support opteorn * The devel tree on sourceforge has opteron support * Mcelog is the more AMD centric way to do it * Mcelog seems reasonably popular (redhat and ubuntu anyways) * Mcelog seems to support numerous events, not just dimm related ecc errors So while getting the ecc module to build would require a new kernel (2.6.18 or newer) and custom patches from sourceforge mcelog just requires a small binary to read /dev/mcelog. I ran it on 180 machines or so and found one very unhappy node: CPU 0 1 instruction cache TSC e6a7a079a8a84 ADDR 117b00 Instruction cache ECC error bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out instruction fetch mem transaction memory access, level generic' STATUS d400400000000853 MCGSTATUS 0 MCE 5 CPU 0 2 bus unit TSC e6a7a079a8ccd ADDR c500 L2 cache ECC error Bus or cache array error bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out generic read mem transaction memory access, level generic' STATUS d400400000000813 MCGSTATUS 0 MCE 6 CPU 0 4 northbridge TSC e6a7a079a906a ADDR 3ce5e0 Northbridge ECC error ECC syndrome = 64 bit32 = err cpu0 bit46 = corrected ecc error bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out generic read mem transaction memory access, level generic' STATUS d432400100000813 MCGSTATUS 0 _______________________________________________ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech