Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-07-22 Thread Shiyang Ruan via
在 2024/7/20 0:04, Dave Jiang 写道: On 7/1/24 7:12 PM, Shiyang Ruan wrote: 在 2024/6/25 21:56, Shiyang Ruan 写道: 在 2024/6/22 1:51, Dan Williams 写道: Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-07-19 Thread Dave Jiang
On 7/1/24 7:12 PM, Shiyang Ruan wrote: > > > 在 2024/6/25 21:56, Shiyang Ruan 写道: >> >> >> 在 2024/6/22 1:51, Dan Williams 写道: >>> Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-07-19 Thread Shiyang Ruan via
在 2024/6/19 0:53, Shiyang Ruan 写道: This patch adds a new notifier_block and MCE_PRIO_CXL, for CXL memdev to check whether the current poison page has been reported (if yes, stop the notifier chain, won't call the following memory_failure() to report), into `x86_mce_decoder_chain`. In this

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-07-01 Thread Shiyang Ruan via
在 2024/6/25 21:56, Shiyang Ruan 写道: 在 2024/6/22 1:51, Dan Williams 写道: Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (INT18), no matter which-First path is configured.  This is the

RE: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-26 Thread Luck, Tony
>> So recovery has some risk, but very little upside benefit. > > Since the hardware provides the instruction(CPU)/command(CXL) to clear > the poison, we could make the function work, at least as an optional > feature. Then users could decide to use it or not after evaluating the > risk and

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-26 Thread Shiyang Ruan via
在 2024/6/22 4:44, Luck, Tony 写道: So who actually cares about recovering poisoned volatile memory? I'd like to understand more on how significant a use case this is. Whilst I can conjecture that its an extreme case of wanting to avoid loosing the ability to create 1GiB or larger pages due to

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-25 Thread Shiyang Ruan via
在 2024/6/22 1:51, Dan Williams 写道: Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (INT18), no matter which-First path is configured. This is the first report. Then currently, in

RE: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Luck, Tony
> So who actually cares about recovering poisoned volatile memory? > I'd like to understand more on how significant a use case this is. > Whilst I can conjecture that its an extreme case of wanting to avoid > loosing the ability to create 1GiB or larger pages due to poison > is that a real problem

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Jonathan Cameron via
On Fri, 21 Jun 2024 10:59:46 -0700 Dan Williams wrote: > Jonathan Cameron wrote: > > On Wed, 19 Jun 2024 00:53:10 +0800 > > Shiyang Ruan wrote: > > > > > Background: > > > Since CXL device is a memory device, while CPU consumes a poison page of > > > CXL device, it always triggers a MCE by

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Dan Williams
Shiyang Ruan wrote: > Background: > Since CXL device is a memory device, while CPU consumes a poison page of > CXL device, it always triggers a MCE by interrupt (INT18), no matter > which-First path is configured. This is the first report. Then > currently, in FW-First path, the poison event

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Jonathan Cameron via
On Fri, 21 Jun 2024 18:16:33 +0800 Shiyang Ruan wrote: > 在 2024/6/21 1:02, Jonathan Cameron 写道: > > On Wed, 19 Jun 2024 00:53:10 +0800 > > Shiyang Ruan wrote: > > > >> Background: > >> Since CXL device is a memory device, while CPU consumes a poison page of > >> CXL device, it always

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Shiyang Ruan via
在 2024/6/20 23:51, Dave Jiang 写道: On 6/19/24 2:24 AM, Shiyang Ruan wrote: 在 2024/6/19 7:35, Dave Jiang 写道: On 6/18/24 9:53 AM, Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-21 Thread Shiyang Ruan via
在 2024/6/21 1:02, Jonathan Cameron 写道: On Wed, 19 Jun 2024 00:53:10 +0800 Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (INT18), no matter which-First path is configured. This is

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-20 Thread Jonathan Cameron via
On Wed, 19 Jun 2024 00:53:10 +0800 Shiyang Ruan wrote: > Background: > Since CXL device is a memory device, while CPU consumes a poison page of > CXL device, it always triggers a MCE by interrupt (INT18), no matter > which-First path is configured. This is the first report. Then >

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-20 Thread Dave Jiang
On 6/19/24 2:24 AM, Shiyang Ruan wrote: > > > 在 2024/6/19 7:35, Dave Jiang 写道: >> >> >> On 6/18/24 9:53 AM, Shiyang Ruan wrote: >>> Background: >>> Since CXL device is a memory device, while CPU consumes a poison page of >>> CXL device, it always triggers a MCE by interrupt (INT18), no matter

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-19 Thread Shiyang Ruan via
在 2024/6/19 7:35, Dave Jiang 写道: On 6/18/24 9:53 AM, Shiyang Ruan wrote: Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (INT18), no matter which-First path is configured. This is the first report.

Re: [RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-18 Thread Dave Jiang
On 6/18/24 9:53 AM, Shiyang Ruan wrote: > Background: > Since CXL device is a memory device, while CPU consumes a poison page of > CXL device, it always triggers a MCE by interrupt (INT18), no matter > which-First path is configured. This is the first report. Then > currently, in FW-First

[RFC PATCH] cxl: avoid duplicating report from MCE & device

2024-06-18 Thread Shiyang Ruan via
Background: Since CXL device is a memory device, while CPU consumes a poison page of CXL device, it always triggers a MCE by interrupt (INT18), no matter which-First path is configured. This is the first report. Then currently, in FW-First path, the poison event is transferred according to