Hi Shiju,
On 06/10/2020 17:13, Shiju Jose wrote:
[...]
> Please find following pseudo code we added for the kernel side to make sure
> we correctly understand your suggestions.
>
> 1. Create edac device and edac device sysfs entries for the online CPU caches.
> /drivers/edac/edac_device.c
>
el@vger.kernel.org; tony.l...@intel.com;
>r...@rjwysocki.net; l...@kernel.org; Linuxarm
>Subject: Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on
>short time period
>
>Hi Shiju,
>
>On 02/10/2020 16:38, Shiju Jose wrote:
>>> -Original Message-
On Fri, Oct 02, 2020 at 06:33:17PM +0100, James Morse wrote:
> > I think adding the CPU error collection to the kernel
> > has the following advantages,
> > 1. The CPU error collection and isolation would not be active if the
> > rasdaemon stopped running or not running on a machine.
r.kernel.org; tony.l...@intel.com; r...@rjwysocki.net;
>> james.mo...@arm.com; l...@kernel.org; Linuxarm
>>
>> Subject: Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on
>> short time period
>>
>> On Fri, Oct 02, 2020 at 01:22:28PM +0100, Shiju
> Because from my x86 CPUs limited experience, the cache arrays are mostly
> fine and errors reported there are not something that happens very
> frequently so we don't even need to collect and count those.
On Intel X86 we leave the counting and threshold decisions about cache
health to the
t;james.mo...@arm.com; l...@kernel.org; Linuxarm
>
>Subject: Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on
>short time period
>
>On Fri, Oct 02, 2020 at 01:22:28PM +0100, Shiju Jose wrote:
>> Open Questions based on the feedback from Boris, 1. ARM process
On Fri, Oct 02, 2020 at 01:22:28PM +0100, Shiju Jose wrote:
> Open Questions based on the feedback from Boris,
> 1. ARM processor error types are cache/TLB/bus errors.
>[Reference N2.4.4.1 ARM Processor Error Information UEFI Spec v2.8]
> Any of the above error types should not be consider for
In ARM64 hardware platforms, for example our Kunpeng platforms, CPU L1/L2
cache corrected errors are reported in the ARM processor error section.
The situations the CPU CE errors are reported too often is not unlikely
and may need to isolate that CPU core to prevent leading to more
serious
8 matches
Mail list logo