Re: ECC support

2015-11-11 Thread John Baldwin
On Friday, October 23, 2015 03:22:54 PM Pokala, Ravi wrote: > -Original Message- > > > >Date: Thu, 22 Oct 2015 11:09:50 -0700 > >From: John Baldwin > >To: freebsd-hardware@freebsd.org > >Cc: Dieter BSD , freebsd-hack...@freebsd.org > >

Re: ECC support

2015-10-23 Thread Pokala, Ravi
-Original Message- >Date: Thu, 22 Oct 2015 11:09:50 -0700 >From: John Baldwin >To: freebsd-hardware@freebsd.org >Cc: Dieter BSD , freebsd-hack...@freebsd.org >Subject: Re: ECC support >Message-ID: <1492434.22kxskh...@ralph.baldwin.cx> >Content-Type: text/plain

Re: ECC support

2015-10-23 Thread Bob Bishop
Hi, > On 22 Oct 2015, at 22:17, John Baldwin wrote: > > On Thursday, October 22, 2015 07:49:13 PM Bob Bishop wrote: >> HI, >> >>> On 22 Oct 2015, at 19:09, John Baldwin wrote: >>> >>> On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote: Chris: > MCA: Bank 1, Status 0x94

Re: ECC support

2015-10-22 Thread John Baldwin
On Thursday, October 22, 2015 07:49:13 PM Bob Bishop wrote: > HI, > > > On 22 Oct 2015, at 19:09, John Baldwin wrote: > > > > On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote: > >> Chris: > >>> MCA: Bank 1, Status 0x94000151 > >>> MCA: Global Cap 0x0106, Status 0

Re: ECC support

2015-10-22 Thread Bob Bishop
HI, > On 22 Oct 2015, at 19:09, John Baldwin wrote: > > On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote: >> Chris: >>> MCA: Bank 1, Status 0x94000151 >>> MCA: Global Cap 0x0106, Status 0x >>> MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2 >>>

Re: ECC support

2015-10-22 Thread John Baldwin
On Wednesday, September 16, 2015 10:56:52 AM Dieter BSD wrote: > Chris: > > MCA: Bank 1, Status 0x94000151 > > MCA: Global Cap 0x0106, Status 0x > > MCA: Vendor "AuthenticAMD", ID 0x100f52, APIC ID 2 > > > > MCA: Address 0x81cc0e9f0 > > > > Kind of freaky. I've n

Re: ECC support

2015-09-18 Thread Peter Jeremy
On 2015-Sep-18 00:05:35 +0100, Bob Bishop wrote: >The answer is quite interesting. A few process shrinks ago, alpha particle >effects were becoming worryingly intrusive and everybody was concerned how >much smaller features on ICs could actually be pushed. I recall when the 64kb DRAMs first app

Re: ECC support

2015-09-18 Thread Tom Evans via freebsd-hardware
On Fri, Sep 18, 2015 at 3:49 AM, Dieter BSD wrote: > Current machine is dying, > so need a replacement asap. Same or similar machine with a good > framebuffer (>= 4K, Freesync) and UVD could be X terminal / HTPC. > Minimal GPU, if any, needed. But can't find a video card with a good > framebuffe

Re: ECC support

2015-09-17 Thread Dieter BSD
quot; Don: > Supermicro has some Atom motherboards with ECC support. Thanks, but the company that designed the atom has a rather long history of design problems. The whole point of ECC is to avoid corrupting the right answer, not to avoid corrupting the wrong answer. They also steal technolo

Re: ECC support

2015-09-17 Thread Bob Bishop
Hi, > On 16 Sep 2015, at 13:04, Bob Bishop wrote: > > >> On 16 Sep 2015, at 12:52, Igor Mozolevsky wrote: >> >> […]The only thing I could think of is that the fab process was(/is?) large >> enough to not worry about "nonsense" like cosmic rays &c (but then I've not >> had much exposure to sem

Re: ECC support

2015-09-16 Thread Don Lewis
On 16 Sep, Dieter BSD wrote: > Andriy: >>> Assuming that a board does have the necessary connections but >>> the firmware does not have ECC support, is there some reason that >>> ECC support could not be added to the OS instead of the firmware? >> >>

Re: ECC support

2015-09-16 Thread Dieter BSD
Andriy: >> Assuming that a board does have the necessary connections but >> the firmware does not have ECC support, is there some reason that >> ECC support could not be added to the OS instead of the firmware? > > Yes, there is. The memory controller is programmed by t

Re: ECC support

2015-09-16 Thread Bob Bishop
> On 16 Sep 2015, at 12:52, Igor Mozolevsky wrote: > > On 16 September 2015 at 12:34, Bob Bishop wrote: > > > > >> "The best we can conclude therefore is that any chip size effect is >> unlikely to dominate error rates given that the trends are not consistent >> across various other confoun

Re: ECC support

2015-09-16 Thread Igor Mozolevsky
On 16 September 2015 at 12:34, Bob Bishop wrote: > "The best we can conclude therefore is that any chip size effect is > unlikely to dominate error rates given that the trends are not consistent > across various other confounders such as age and manufacturer.” > > I’ll admit to talking that po

Re: ECC support

2015-09-16 Thread Bob Bishop
Hi, > On 16 Sep 2015, at 11:48, Igor Mozolevsky wrote: > > On 16 September 2015 at 08:51, Bob Bishop wrote: > > > >> - You might think that as memory density increases (ie bit cell size > shrinks), error rates would increase. Apparently this wasn’t so up to 2009 > at least, see: >> >> http:

Re: ECC support

2015-09-16 Thread Igor Mozolevsky
On 16 September 2015 at 08:51, Bob Bishop wrote: > - You might think that as memory density increases (ie bit cell size shrinks), error rates would increase. Apparently this wasn’t so up to 2009 at least, see: > > http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf subsection 5.1: "… F

Re: ECC support

2015-09-16 Thread Bob Bishop
Hi, Arriving late to this thread, a few observations: - Obviously the more RAM you have, the more errors you are going to see. In other words, ECC makes increasing sense as RAM sizes get larger. All server-class hardware should have it. - DRAM has to be refreshed. In sensible designs, ECC scru

Re: ECC support

2015-09-15 Thread Konstantin Belousov
On Wed, Sep 16, 2015 at 12:14:00AM +0300, Andriy Gapon wrote: > On 15/09/2015 23:53, Dieter BSD wrote: > > Assuming that a board does have the necessary connections but > > the firmware does not have ECC support, is there some reason that > > ECC support could not be added to

Re: ECC support

2015-09-15 Thread Don Lewis
On 15 Sep, Jim Thompson wrote: > >> On Sep 15, 2015, at 5:19 PM, Igor Mozolevsky >> wrote: >> >> On 15 September 2015 at 22:52, Jim Thompson > > wrote: >> >> >> >> Errors are corrected "on-the-fly," corrected data is almost never >> placed back in memory. If the same

Re: ECC support

2015-09-15 Thread Don Lewis
supports a single channel > of ECC ram. Interesting ... it's been a while since I looked. I think the primary sockets at the time were FM1, FM2, and FM2+, and the mobile sockets, and they didn't support ECC. AM1 motherboard ECC support seems to be pretty lacking, though.

Re: ECC support

2015-09-15 Thread alex.burlyga.ietf alex.burlyga.ietf
On Tue, Sep 15, 2015 at 3:52 PM, Igor Mozolevsky wrote: > On 15 September 2015 at 23:34, Jim Thompson wrote: > > > > >> I think you’ll find that the default for ‘scrub’ is off on most (perhaps >> all) boards. There are reasons, and these relate directly to >> “significantly diminish system perf

Re: ECC support

2015-09-15 Thread Igor Mozolevsky
On 15 September 2015 at 23:34, Jim Thompson wrote: > I think you’ll find that the default for ‘scrub’ is off on most (perhaps > all) boards. There are reasons, and these relate directly to > “significantly diminish system performance”, (above), as well as the > greatly increased RAM sizes in

Re: ECC support

2015-09-15 Thread Jim Thompson
> On Sep 15, 2015, at 5:10 PM, Don Lewis wrote: > > On 15 Sep, Dieter BSD wrote: >> Many of AMD's CPU/APU parts support ECC memory. Not just the top of the >> line parts, but also many of the less expensive, less power hungry parts. >> However, many (most?) of the boards for these chips do not

Re: ECC support

2015-09-15 Thread Jim Thompson
> On Sep 15, 2015, at 5:19 PM, Igor Mozolevsky wrote: > > On 15 September 2015 at 22:52, Jim Thompson > wrote: > > > > Errors are corrected "on-the-fly," corrected data is almost never placed back > in memory. If the same corrupt data is read again, the correction p

Re: ECC support

2015-09-15 Thread Igor Mozolevsky
On 15 September 2015 at 22:52, Jim Thompson wrote: Errors are corrected "on-the-fly," corrected data is almost never placed > back in memory. If the same corrupt data is read again, the correction > process is repeated. Replacing the data in memory would require processing > overhead that could

Re: ECC support

2015-09-15 Thread Don Lewis
dded, but most boards do not > have firmware sources available. > > Assuming that a board does have the necessary connections but > the firmware does not have ECC support, is there some reason that > ECC support could not be added to the OS instead of the firmware? > I grepped thro

Re: ECC support

2015-09-15 Thread Jim Thompson
e, so this code could be added, but most boards do not > have firmware sources available. > > Assuming that a board does have the necessary connections but > the firmware does not have ECC support, is there some reason that > ECC support could not be added to the OS instead of the firmwa

Re: ECC support

2015-09-15 Thread Andriy Gapon
On 15/09/2015 23:53, Dieter BSD wrote: > Assuming that a board does have the necessary connections but > the firmware does not have ECC support, is there some reason that > ECC support could not be added to the OS instead of the firmware? Yes, there is. The memory controller is programm

Re: ECC support

2015-09-15 Thread Xin Li
On 09/15/15 13:53, Dieter BSD wrote: > I've been running machines with ECC for 15-20 years and have never seen > a report of an ECC error from either NetBSD or FreeBSD. I have seen > reports of ECC errors from Digital Unix. And remember getting panics > due to parity errors on machines before ECC

ECC support

2015-09-15 Thread Dieter BSD
ary connections but the firmware does not have ECC support, is there some reason that ECC support could not be added to the OS instead of the firmware? I grepped through FreeBSD 8.2 and 10.1 sources but couldn't find anything that looked relevant. Also did not find any code that reported ECC er