Re: [CentOS] Machine check events

2013-11-28 Thread Cliff Pratt
He's not running the Poisson distro, he's using CentOS! 8-) On Fri, Nov 29, 2013 at 11:57 AM, Devin Reade wrote: > Quoting Glenn Eychaner : > > > This is brand-new Kingston 1600MHz ECC memory on a workstation/server > > running at high altitude [snip] > > Cosmic rays? Do you have a Poisson dis

Re: [CentOS] Machine check events

2013-11-28 Thread Devin Reade
Quoting Glenn Eychaner : > This is brand-new Kingston 1600MHz ECC memory on a workstation/server > running at high altitude [snip] Cosmic rays? Do you have a Poisson distribution for those machine check events? :) Devin ___ CentOS mailing list CentO

Re: [CentOS] Machine check events

2013-11-28 Thread Glenn Eychaner
m.roth writes: > Is the system still under warranty? How 'bout the memory, if you've > replaced it? You *should* replace it. It's not going to get better This is brand-new Kingston 1600MHz ECC memory on a workstation/server running at high altitude in a relatively open environment; I am loath

Re: [CentOS] Machine check events

2013-11-27 Thread m . roth
Glenn Eychaner wrote: > And all that work was done to get this, output of a corrected memory > parity error. I get about one of these per workstation per 3 days, more or less; > is this a surprising number? (The workstation under the heaviest load gets > more, while the idle spare gets none at all;

Re: [CentOS] Machine check events

2013-11-27 Thread Glenn Eychaner
And all that work was done to get this, output of a corrected memory parity error. I get about one of these per workstation per 3 days, more or less; is this a surprising number? (The workstation under the heaviest load gets more, while the idle spare gets none at all; no surprise there!) MCE 6 CP

Re: [CentOS] Machine check events

2013-11-27 Thread Glenn Eychaner
On further, further, further toying, I now have mcelog running on my 32-bit CentOS 6 systems! I admit to doing it the "dumb" way: I grabbed the source from the git repository, compiled and installed it, and THEN discovered that the init.d file supplied with the source was not CentOS compatible, so

Re: [CentOS] Machine check events

2013-11-26 Thread Patrick Lists
On 11/26/2013 03:11 PM, Glenn Eychaner wrote: [snip] > The current kernel I am running is 2.6.32-358.23.2, but I can't tell whether > it > has CONFIG_X86_MCE enabled. How can I find this out? $ grep CONFIG_X86_MCE /boot/config-2.6.32-358.23.2.el6.x86_64 CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CO

Re: [CentOS] Machine check events

2013-11-26 Thread Glenn Eychaner
On further, further investigation, it looks like according to the mcelog install guide at http://www.mcelog.org/installation.html, I could "roll my own" for 32-bit CentOS 6: "For bad page offlining you will need a 2.6.33+ kernel or a 2.6.32 kernel with the soft offlining capability backported (li

Re: [CentOS] Machine check events

2013-11-26 Thread John Doe
From: Glenn Eychaner > Further investigation seems to indicate that these events should be handled > by "mcelog" or "mced". However, there is no /var/log/mcelog, > nor do I have a > "mcelog" or "mced" binary, nor does yum seem to contain > anything related > (based on "yum whatprovides '*/mcelo

Re: [CentOS] Machine check events

2013-11-26 Thread Stephen Harris
On Tue, Nov 26, 2013 at 09:25:55AM -0300, Glenn Eychaner wrote: > Further investigation seems to indicate that these events should be handled > by "mcelog" or "mced". However, there is no /var/log/mcelog, nor do I have a > "mcelog" or "mced" binary, nor does yum seem to contain anything related > (

Re: [CentOS] Machine check events

2013-11-26 Thread Glenn Eychaner
Further investigation seems to indicate that these events should be handled by "mcelog" or "mced". However, there is no /var/log/mcelog, nor do I have a "mcelog" or "mced" binary, nor does yum seem to contain anything related (based on "yum whatprovides '*/mcelog'" and similar queries). Thus, I st

[CentOS] Machine check events

2013-11-25 Thread Glenn Eychaner
On my new Haswell-based machines, I am occasionally seeing entries like the following in /var/log/messages: kernel: [Hardware Error]: Machine check events logged (I would not have even noticed them, except that they get flagged by logwatch.) These messages always occur alone, and don't seem