Re: [PATCH] Raise maximum number of memory controllers

2018-10-01 Thread Borislav Petkov
On Thu, Sep 27, 2018 at 10:10:54PM -0300, Mauro Carvalho Chehab wrote: > I don't remember about any rationale behind /sys/bus/edac. It was > there already before I start working on EDAC about 10 years ago. > I guess it was used in the past by edac-utils (or maybe it is just a > side effect of the

Re: [PATCH] Raise maximum number of memory controllers

2018-10-01 Thread Borislav Petkov
On Thu, Sep 27, 2018 at 10:10:54PM -0300, Mauro Carvalho Chehab wrote: > I don't remember about any rationale behind /sys/bus/edac. It was > there already before I start working on EDAC about 10 years ago. > I guess it was used in the past by edac-utils (or maybe it is just a > side effect of the

Re: [PATCH] Raise maximum number of memory controllers

2018-09-27 Thread Mauro Carvalho Chehab
Em Fri, 28 Sep 2018 00:03:55 +0200 Borislav Petkov escreveu: > On Thu, Sep 27, 2018 at 02:44:01PM -0700, Luck, Tony wrote: > > The problem with your patch that gets rid of EDAC_MAX_MCS is making > > device links under /sys/bus/edac. Which is hinted at in some of the > > code your patch deleted:

Re: [PATCH] Raise maximum number of memory controllers

2018-09-27 Thread Mauro Carvalho Chehab
Em Fri, 28 Sep 2018 00:03:55 +0200 Borislav Petkov escreveu: > On Thu, Sep 27, 2018 at 02:44:01PM -0700, Luck, Tony wrote: > > The problem with your patch that gets rid of EDAC_MAX_MCS is making > > device links under /sys/bus/edac. Which is hinted at in some of the > > code your patch deleted:

Re: [PATCH] Raise maximum number of memory controllers

2018-09-27 Thread Borislav Petkov
On Thu, Sep 27, 2018 at 02:44:01PM -0700, Luck, Tony wrote: > The problem with your patch that gets rid of EDAC_MAX_MCS is making > device links under /sys/bus/edac. Which is hinted at in some of the > code your patch deleted: > > - /* > -* The memory controller needs its own bus,

Re: [PATCH] Raise maximum number of memory controllers

2018-09-27 Thread Borislav Petkov
On Thu, Sep 27, 2018 at 02:44:01PM -0700, Luck, Tony wrote: > The problem with your patch that gets rid of EDAC_MAX_MCS is making > device links under /sys/bus/edac. Which is hinted at in some of the > code your patch deleted: > > - /* > -* The memory controller needs its own bus,

Re: [PATCH] Raise maximum number of memory controllers

2018-09-27 Thread Luck, Tony
On Thu, Sep 27, 2018 at 06:52:44AM +0200, Borislav Petkov wrote: > On Wed, Sep 26, 2018 at 04:02:57PM -0700, Luck, Tony wrote: > > But ... we are at -rc5. Not sure that we'll figure out, write, test & debug > > the proper solution in the next 3-4 weeks. So perhaps we should apply > > > > -#define

Re: [PATCH] Raise maximum number of memory controllers

2018-09-27 Thread Luck, Tony
On Thu, Sep 27, 2018 at 06:52:44AM +0200, Borislav Petkov wrote: > On Wed, Sep 26, 2018 at 04:02:57PM -0700, Luck, Tony wrote: > > But ... we are at -rc5. Not sure that we'll figure out, write, test & debug > > the proper solution in the next 3-4 weeks. So perhaps we should apply > > > > -#define

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote: > We observe an oops in the skx_edac module during boot. > Examining /var/log/messages: > [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller > Skylake Socket#0 IMC#0 > [ 3401.985887] EDAC MC1: Giving out device to

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote: > We observe an oops in the skx_edac module during boot. > Examining /var/log/messages: > [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller > Skylake Socket#0 IMC#0 > [ 3401.985887] EDAC MC1: Giving out device to

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Wed, Sep 26, 2018 at 04:02:57PM -0700, Luck, Tony wrote: > We don't have stats, nor control of power on a per memory controller > or per dimm basis. So all these files are just noise. Yeah, and also, looking at your previous mail, stuff like: /sys/bus/mc6/devices/dimm0

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Wed, Sep 26, 2018 at 04:02:57PM -0700, Luck, Tony wrote: > We don't have stats, nor control of power on a per memory controller > or per dimm basis. So all these files are just noise. Yeah, and also, looking at your previous mail, stuff like: /sys/bus/mc6/devices/dimm0

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Luck, Tony
This issue has made me look a bit more at what EDAC puts in sysfs. It seems like the current code inherits some useless baggage from the device calls it makes. E.g. all the "power" subdirectories: $ find /sys/devices/system/edac -name power /sys/devices/system/edac/power

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Luck, Tony
This issue has made me look a bit more at what EDAC puts in sysfs. It seems like the current code inherits some useless baggage from the device calls it makes. E.g. all the "power" subdirectories: $ find /sys/devices/system/edac -name power /sys/devices/system/edac/power

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Russ Anderson
On Wed, Sep 26, 2018 at 11:10:35AM -0700, Luck, Tony wrote: > On Wed, Sep 26, 2018 at 06:17:49PM +0200, Borislav Petkov wrote: > > On Wed, Sep 26, 2018 at 01:03:40PM -0300, Mauro Carvalho Chehab wrote: > > > I guess this is/was needed to create things like this: > > > > > > lrwxrwxrwx 1 root

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Russ Anderson
On Wed, Sep 26, 2018 at 11:10:35AM -0700, Luck, Tony wrote: > On Wed, Sep 26, 2018 at 06:17:49PM +0200, Borislav Petkov wrote: > > On Wed, Sep 26, 2018 at 01:03:40PM -0300, Mauro Carvalho Chehab wrote: > > > I guess this is/was needed to create things like this: > > > > > > lrwxrwxrwx 1 root

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Luck, Tony
On Wed, Sep 26, 2018 at 06:17:49PM +0200, Borislav Petkov wrote: > On Wed, Sep 26, 2018 at 01:03:40PM -0300, Mauro Carvalho Chehab wrote: > > I guess this is/was needed to create things like this: > > > > lrwxrwxrwx 1 root root 0 set 26 05:24 /sys/bus/edac/devices/mc -> > >

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Luck, Tony
On Wed, Sep 26, 2018 at 06:17:49PM +0200, Borislav Petkov wrote: > On Wed, Sep 26, 2018 at 01:03:40PM -0300, Mauro Carvalho Chehab wrote: > > I guess this is/was needed to create things like this: > > > > lrwxrwxrwx 1 root root 0 set 26 05:24 /sys/bus/edac/devices/mc -> > >

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Mauro Carvalho Chehab
Em Wed, 26 Sep 2018 18:17:49 +0200 Borislav Petkov escreveu: > On Wed, Sep 26, 2018 at 01:03:40PM -0300, Mauro Carvalho Chehab wrote: > > I guess this is/was needed to create things like this: > > > > lrwxrwxrwx 1 root root 0 set 26 05:24 /sys/bus/edac/devices/mc -> > >

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Mauro Carvalho Chehab
Em Wed, 26 Sep 2018 18:17:49 +0200 Borislav Petkov escreveu: > On Wed, Sep 26, 2018 at 01:03:40PM -0300, Mauro Carvalho Chehab wrote: > > I guess this is/was needed to create things like this: > > > > lrwxrwxrwx 1 root root 0 set 26 05:24 /sys/bus/edac/devices/mc -> > >

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Wed, Sep 26, 2018 at 01:03:40PM -0300, Mauro Carvalho Chehab wrote: > I guess this is/was needed to create things like this: > > lrwxrwxrwx 1 root root 0 set 26 05:24 /sys/bus/edac/devices/mc -> > ../../../devices/system/edac/mc They're still there: $ ls -l /sys/bus/edac/devices/

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Wed, Sep 26, 2018 at 01:03:40PM -0300, Mauro Carvalho Chehab wrote: > I guess this is/was needed to create things like this: > > lrwxrwxrwx 1 root root 0 set 26 05:24 /sys/bus/edac/devices/mc -> > ../../../devices/system/edac/mc They're still there: $ ls -l /sys/bus/edac/devices/

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Aristeu Rozanski
On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote: > We observe an oops in the skx_edac module during boot. That's happening also on sb_edac too and the oops comes from memory corruption after trying to load the module several times during boot. -- Aristeu

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Aristeu Rozanski
On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote: > We observe an oops in the skx_edac module during boot. That's happening also on sb_edac too and the oops comes from memory corruption after trying to load the module several times during boot. -- Aristeu

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Mauro Carvalho Chehab
Em Wed, 26 Sep 2018 17:27:52 +0200 Borislav Petkov escreveu: > On Wed, Sep 26, 2018 at 11:35:11AM +0200, Borislav Petkov wrote: > > * or Greg coming and saying, you're using bus_type all wrong and you > > shouldn't and you should remove it completely! :-) > > Yap, and so he did! :-) > > It

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Mauro Carvalho Chehab
Em Wed, 26 Sep 2018 17:27:52 +0200 Borislav Petkov escreveu: > On Wed, Sep 26, 2018 at 11:35:11AM +0200, Borislav Petkov wrote: > > * or Greg coming and saying, you're using bus_type all wrong and you > > shouldn't and you should remove it completely! :-) > > Yap, and so he did! :-) > > It

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Wed, Sep 26, 2018 at 11:35:11AM +0200, Borislav Petkov wrote: > * or Greg coming and saying, you're using bus_type all wrong and you > shouldn't and you should remove it completely! :-) Yap, and so he did! :-) It looks like we can remove the whole per-MC bus thing, see below. Patch seems to

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Wed, Sep 26, 2018 at 11:35:11AM +0200, Borislav Petkov wrote: > * or Greg coming and saying, you're using bus_type all wrong and you > shouldn't and you should remove it completely! :-) Yap, and so he did! :-) It looks like we can remove the whole per-MC bus thing, see below. Patch seems to

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Russ Anderson
On Wed, Sep 26, 2018 at 07:55:39AM +, Zhuo, Qiuxu wrote: > Hi Justin, > > > [ 3401.987556] EDAC MC15: Giving out device to module skx_edac controller > > Skylake Socket#1 IMC#1 > Just curious, has the system(two memory controllers per socket) got more > than 8 sockets? > Normally,

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Russ Anderson
On Wed, Sep 26, 2018 at 07:55:39AM +, Zhuo, Qiuxu wrote: > Hi Justin, > > > [ 3401.987556] EDAC MC15: Giving out device to module skx_edac controller > > Skylake Socket#1 IMC#1 > Just curious, has the system(two memory controllers per socket) got more > than 8 sockets? > Normally,

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Tue, Sep 25, 2018 at 08:07:33PM +0200, Borislav Petkov wrote: > Now I remember. I did that for lockdep because it wants statically > allocated memory. I'll try to think of something tomorrow. Some more info after some staring: We could've made the lock_class_key only static storage so that

Re: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Borislav Petkov
On Tue, Sep 25, 2018 at 08:07:33PM +0200, Borislav Petkov wrote: > Now I remember. I did that for lockdep because it wants statically > allocated memory. I'll try to think of something tomorrow. Some more info after some staring: We could've made the lock_class_key only static storage so that

RE: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Zhuo, Qiuxu
Hi Justin, > [ 3401.987556] EDAC MC15: Giving out device to module skx_edac controller > Skylake Socket#1 IMC#1 Just curious, has the system(two memory controllers per socket) got more than 8 sockets? Normally, the number "1" in the above string "Skylake Socekt#1 IMC#1" should be 7

RE: [PATCH] Raise maximum number of memory controllers

2018-09-26 Thread Zhuo, Qiuxu
Hi Justin, > [ 3401.987556] EDAC MC15: Giving out device to module skx_edac controller > Skylake Socket#1 IMC#1 Just curious, has the system(two memory controllers per socket) got more than 8 sockets? Normally, the number "1" in the above string "Skylake Socekt#1 IMC#1" should be 7

Re: [PATCH] Raise maximum number of memory controllers

2018-09-25 Thread Borislav Petkov
On Tue, Sep 25, 2018 at 10:50:23AM -0700, Luck, Tony wrote: > There are way too many places where we use the identifier "bus" > in the edac core and drivers. But I'm not sure that we need a > static array mc_bus[EDAC_MAX_MCS]. That, of course, is another way of looking at it which I didn't think

Re: [PATCH] Raise maximum number of memory controllers

2018-09-25 Thread Borislav Petkov
On Tue, Sep 25, 2018 at 10:50:23AM -0700, Luck, Tony wrote: > There are way too many places where we use the identifier "bus" > in the edac core and drivers. But I'm not sure that we need a > static array mc_bus[EDAC_MAX_MCS]. That, of course, is another way of looking at it which I didn't think

Re: [PATCH] Raise maximum number of memory controllers

2018-09-25 Thread Luck, Tony
On Tue, Sep 25, 2018 at 05:26:59PM +0200, Borislav Petkov wrote: > On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote: > > We observe an oops in the skx_edac module during boot. > > Examining /var/log/messages: > > [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller

Re: [PATCH] Raise maximum number of memory controllers

2018-09-25 Thread Luck, Tony
On Tue, Sep 25, 2018 at 05:26:59PM +0200, Borislav Petkov wrote: > On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote: > > We observe an oops in the skx_edac module during boot. > > Examining /var/log/messages: > > [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller

Re: [PATCH] Raise maximum number of memory controllers

2018-09-25 Thread Borislav Petkov
On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote: > We observe an oops in the skx_edac module during boot. > Examining /var/log/messages: > [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller > Skylake Socket#0 IMC#0 > [ 3401.985887] EDAC MC1: Giving out device to

Re: [PATCH] Raise maximum number of memory controllers

2018-09-25 Thread Borislav Petkov
On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote: > We observe an oops in the skx_edac module during boot. > Examining /var/log/messages: > [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller > Skylake Socket#0 IMC#0 > [ 3401.985887] EDAC MC1: Giving out device to

[PATCH] Raise maximum number of memory controllers

2018-09-25 Thread Justin Ernst
We observe an oops in the skx_edac module during boot. Examining /var/log/messages: [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0 [ 3401.985887] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1 [ 3401.986014] EDAC

[PATCH] Raise maximum number of memory controllers

2018-09-25 Thread Justin Ernst
We observe an oops in the skx_edac module during boot. Examining /var/log/messages: [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0 [ 3401.985887] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1 [ 3401.986014] EDAC