RE: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-07-24 Thread Yasunori Gotou (Fujitsu)
Hello, everyone! > >>> 在 2024/6/22 1:51, Dan Williams 写道: > Shiyang Ruan wrote: > > Background: > > Since CXL device is a memory device, while CPU consumes a poison > page of > > CXL device, it always triggers a MCE by interrupt (INT18), no matter > > which-First path is confi

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-07-22 Thread Shiyang Ruan via
在 2024/7/20 0:04, Dave Jiang 写道: On 7/1/24 7:12 PM, Shiyang Ruan wrote: 在 2024/6/25 21:56, Shiyang Ruan 写道: 在 2024/6/22 1:51, Dan Williams 写道: Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-07-19 Thread Dave Jiang
On 7/1/24 7:12 PM, Shiyang Ruan wrote: > > > 在 2024/6/25 21:56, Shiyang Ruan 写道: >> >> >> 在 2024/6/22 1:51, Dan Williams 写道: >>> Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by inter

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-07-19 Thread Shiyang Ruan via
在 2024/6/19 0:53, Shiyang Ruan 写道: This patch adds a new notifier_block and MCE_PRIO_CXL, for CXL memdev to check whether the current poison page has been reported (if yes, stop the notifier chain, won't call the following memory_failure() to report), into `x86_mce_decoder_chain`. In this way

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-07-01 Thread Shiyang Ruan via
在 2024/6/25 21:56, Shiyang Ruan 写道: 在 2024/6/22 1:51, Dan Williams 写道: Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (INT18), no matter which-First path is configured.  This is the

RE: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-26 Thread Luck, Tony
>> So recovery has some risk, but very little upside benefit. > > Since the hardware provides the instruction(CPU)/command(CXL) to clear > the poison, we could make the function work, at least as an optional > feature. Then users could decide to use it or not after evaluating the > risk and ben

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-25 Thread Shiyang Ruan via
在 2024/6/22 4:44, Luck, Tony 写道: So who actually cares about recovering poisoned volatile memory? I'd like to understand more on how significant a use case this is. Whilst I can conjecture that its an extreme case of wanting to avoid loosing the ability to create 1GiB or larger pages due to po

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-25 Thread Shiyang Ruan via
在 2024/6/22 1:51, Dan Williams 写道: Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (INT18), no matter which-First path is configured. This is the first report. Then currently, in FW-Fi

RE: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Luck, Tony
> So who actually cares about recovering poisoned volatile memory? > I'd like to understand more on how significant a use case this is. > Whilst I can conjecture that its an extreme case of wanting to avoid > loosing the ability to create 1GiB or larger pages due to poison > is that a real problem

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Jonathan Cameron via
On Fri, 21 Jun 2024 10:59:46 -0700 Dan Williams wrote: > Jonathan Cameron wrote: > > On Wed, 19 Jun 2024 00:53:10 +0800 > > Shiyang Ruan wrote: > > > > > Background: > > > Since CXL device is a memory device, while CPU consumes a poison page of > > > CXL device, it always triggers a MCE by i

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Dan Williams
Shiyang Ruan wrote: > Background: > Since CXL device is a memory device, while CPU consumes a poison page of > CXL device, it always triggers a MCE by interrupt (INT18), no matter > which-First path is configured. This is the first report. Then > currently, in FW-First path, the poison event i

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Jonathan Cameron via
On Fri, 21 Jun 2024 18:16:33 +0800 Shiyang Ruan wrote: > 在 2024/6/21 1:02, Jonathan Cameron 写道: > > On Wed, 19 Jun 2024 00:53:10 +0800 > > Shiyang Ruan wrote: > > > >> Background: > >> Since CXL device is a memory device, while CPU consumes a poison page of > >> CXL device, it always triggers

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Shiyang Ruan via
在 2024/6/20 23:51, Dave Jiang 写道: On 6/19/24 2:24 AM, Shiyang Ruan wrote: 在 2024/6/19 7:35, Dave Jiang 写道: On 6/18/24 9:53 AM, Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (IN

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Shiyang Ruan via
在 2024/6/21 1:02, Jonathan Cameron 写道: On Wed, 19 Jun 2024 00:53:10 +0800 Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (INT18), no matter which-First path is configured. This is th

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-20 Thread Jonathan Cameron via
On Wed, 19 Jun 2024 00:53:10 +0800 Shiyang Ruan wrote: > Background: > Since CXL device is a memory device, while CPU consumes a poison page of > CXL device, it always triggers a MCE by interrupt (INT18), no matter > which-First path is configured. This is the first report. Then > currently,

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-20 Thread Dave Jiang
On 6/19/24 2:24 AM, Shiyang Ruan wrote: > > > 在 2024/6/19 7:35, Dave Jiang 写道: >> >> >> On 6/18/24 9:53 AM, Shiyang Ruan wrote: >>> Background: >>> Since CXL device is a memory device, while CPU consumes a poison page of >>> CXL device, it always triggers a MCE by interrupt (INT18), no matter

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-19 Thread Shiyang Ruan via
在 2024/6/19 7:35, Dave Jiang 写道: On 6/18/24 9:53 AM, Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (INT18), no matter which-First path is configured. This is the first report. Then

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-18 Thread Dave Jiang
On 6/18/24 9:53 AM, Shiyang Ruan wrote: > Background: > Since CXL device is a memory device, while CPU consumes a poison page of > CXL device, it always triggers a MCE by interrupt (INT18), no matter > which-First path is configured. This is the first report. Then > currently, in FW-First p