Re: kernel MCA messages

2010-08-25 Thread Andriy Gapon
on 25/08/2010 02:38 Jeremy Chadwick said the following: On Tue, Aug 24, 2010 at 07:13:23PM -0400, Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status

Re: kernel MCA messages

2010-08-25 Thread Tom Evans
On Tue, Aug 24, 2010 at 4:06 PM, John Baldwin j...@freebsd.org wrote: On Monday, August 23, 2010 5:35:40 pm Matthew D. Fuller wrote: On Mon, Aug 23, 2010 at 08:20:35AM -0400 I heard the voice of John Baldwin, and lo! it spake thus: It is not private, it is in //depot/projects/mcelog/... in

Re: kernel MCA messages

2010-08-25 Thread Dan Langille
On 8/25/2010 3:11 AM, Andriy Gapon wrote: Have you read the decoded message? Please re-read it. I still recommend reading at least the summary of the RAM ECC research article to make your own judgment about need to replace DRAM. Andriy: What is your interpretation of the decoded message?

Re: kernel MCA messages

2010-08-25 Thread Andriy Gapon
on 25/08/2010 13:41 Dan Langille said the following: On 8/25/2010 3:11 AM, Andriy Gapon wrote: Have you read the decoded message? Please re-read it. I still recommend reading at least the summary of the RAM ECC research article to make your own judgment about need to replace DRAM.

Re: kernel MCA messages

2010-08-25 Thread John Baldwin
On Wednesday, August 25, 2010 12:05:09 am Matthew D. Fuller wrote: On Tue, Aug 24, 2010 at 11:06:43AM -0400 I heard the voice of John Baldwin, and lo! it spake thus: It is actually public at perforce.freebsd.org. :) However, it is tedious to download the files. Oh, I'd apparently

Re: kernel MCA messages

2010-08-25 Thread John Baldwin
On Tuesday, August 24, 2010 7:13:23 pm Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC

Re: kernel MCA messages

2010-08-25 Thread John Baldwin
On Wednesday, August 25, 2010 7:01:19 am Andriy Gapon wrote: on 25/08/2010 13:41 Dan Langille said the following: On 8/25/2010 3:11 AM, Andriy Gapon wrote: Have you read the decoded message? Please re-read it. I still recommend reading at least the summary of the RAM ECC research

Re: kernel MCA messages

2010-08-25 Thread Andriy Gapon
on 25/08/2010 15:23 John Baldwin said the following: That is because machine checks for corrected errors have to be polled and the kernel polls once an hour. On newer Intel CPUs (such as Nehalem) there is a separate interrupt (CMCI) that can fire for corrected errors. I think that on AMD

Re: kernel MCA messages

2010-08-25 Thread Andriy Gapon
on 25/08/2010 18:02 Andriy Gapon said the following: on 25/08/2010 15:23 John Baldwin said the following: That is because machine checks for corrected errors have to be polled and the kernel polls once an hour. On newer Intel CPUs (such as Nehalem) there is a separate interrupt (CMCI)

Re: kernel MCA messages

2010-08-24 Thread Ronald Klop
On Mon, 23 Aug 2010 14:20:35 +0200, John Baldwin j...@freebsd.org wrote: On Monday, August 23, 2010 2:44:38 am Andriy Gapon wrote: on 23/08/2010 05:05 Dan Langille said the following: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status

Re: kernel MCA messages

2010-08-24 Thread Andriy Gapon
on 24/08/2010 09:14 Ronald Klop said the following: A little off topic, but what is 'a low rate of corrected ECC errors'? At work one machine has them like ones per day, but runs ok. Is ones per day much? That's up to your judgment. It's like after how many remapped sectors do you replace

Re: kernel MCA messages

2010-08-24 Thread John Baldwin
On Monday, August 23, 2010 5:35:40 pm Matthew D. Fuller wrote: On Mon, Aug 23, 2010 at 08:20:35AM -0400 I heard the voice of John Baldwin, and lo! it spake thus: It is not private, it is in //depot/projects/mcelog/... in p4. Which may as well be Siberia for us lowly non-developers. Any

Re: kernel MCA messages

2010-08-24 Thread Artem Belevich
IMHO the key here is whether hardware is broken or not. The only case where correctable ECC errors are OK is when a bit gets flipped by a high-energy particle. That's a normal but fairly rare event. If you get bit flips often enough that you can recall details of more then one of them on the same

Re: kernel MCA messages

2010-08-24 Thread Andriy Gapon
on 24/08/2010 22:51 Artem Belevich said the following: IMHO the key here is whether hardware is broken or not. The only case where correctable ECC errors are OK is when a bit gets flipped by a high-energy particle. That's a normal but fairly rare event. If you get bit flips often enough that

Re: kernel MCA messages

2010-08-24 Thread Dan Langille
On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA:

Re: kernel MCA messages

2010-08-24 Thread Jeremy Chadwick
On Tue, Aug 24, 2010 at 07:13:23PM -0400, Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC

Re: kernel MCA messages

2010-08-24 Thread Dan Langille
On 8/24/2010 7:38 PM, Jeremy Chadwick wrote: On Tue, Aug 24, 2010 at 07:13:23PM -0400, Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x

Re: kernel MCA messages

2010-08-24 Thread Matthew D. Fuller
On Tue, Aug 24, 2010 at 11:06:43AM -0400 I heard the voice of John Baldwin, and lo! it spake thus: It is actually public at perforce.freebsd.org. :) However, it is tedious to download the files. Oh, I'd apparently blocked out of my mind that you could clicky-clicky files one at a time from

Re: kernel MCA messages

2010-08-23 Thread Andriy Gapon
on 23/08/2010 05:05 Dan Langille said the following: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0

Re: kernel MCA messages

2010-08-23 Thread John Baldwin
On Monday, August 23, 2010 2:44:38 am Andriy Gapon wrote: on 23/08/2010 05:05 Dan Langille said the following: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status

Re: kernel MCA messages

2010-08-23 Thread Matthew D. Fuller
On Mon, Aug 23, 2010 at 08:20:35AM -0400 I heard the voice of John Baldwin, and lo! it spake thus: It is not private, it is in //depot/projects/mcelog/... in p4. Which may as well be Siberia for us lowly non-developers. Any chance you could stick a tarball or a patch against upstream mcelog

Re: kernel MCA messages

2010-08-23 Thread Dan Langille
On 8/22/2010 10:05 PM, Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU

Re: kernel MCA messages

2010-08-23 Thread Andriy Gapon
on 24/08/2010 02:43 Dan Langille said the following: On 8/22/2010 10:05 PM, Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA:

Re: kernel MCA messages

2010-08-23 Thread Dan Langille
On 8/23/2010 7:47 PM, Andriy Gapon wrote: on 24/08/2010 02:43 Dan Langille said the following: On 8/22/2010 10:05 PM, Dan Langille wrote: On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105,

Re: kernel MCA messages

2010-08-22 Thread Daniel O'Connor
On 23/08/2010, at 10:48, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory

Re: kernel MCA messages

2010-08-22 Thread Dan Langille
On 8/22/2010 9:18 PM, Dan Langille wrote: What does this mean? kernel: MCA: Bank 4, Status 0x940c4001fe080813 kernel: MCA: Global Cap 0x0105, Status 0x kernel: MCA: Vendor AuthenticAMD, ID 0xf5a, APIC ID 0 kernel: MCA: CPU 0 COR BUSLG Source RD Memory kernel: MCA: