在 2022/10/22 AM12:30, Luck, Tony 写道:
>>> But maybe it is some RMW instruction ... then, if all the above options
>>> didn't happen ... we
>>> could get another machine check from the same address. But then we just
>>> follow the usual
>>> recovery path.
>
>
>> Let assume the instruction that
>> But maybe it is some RMW instruction ... then, if all the above options
>> didn't happen ... we
>> could get another machine check from the same address. But then we just
>> follow the usual
>> recovery path.
> Let assume the instruction that cause the COW is in the 63/64 case, aka,
> it is
在 2022/10/21 PM12:41, Luck, Tony 写道:
>>> When we do return to user mode the task is going to be busy servicing
>>> a SIGBUS ... so shouldn't try to touch the poison page before the
>>> memory_failure() called by the worker thread cleans things up.
>>
>> What about an RT process on a busy system?
在 2022/10/21 PM12:08, Tony Luck 写道:
> On Fri, Oct 21, 2022 at 09:52:01AM +0800, Shuai Xue wrote:
>>
>>
>> 在 2022/10/21 AM4:05, Tony Luck 写道:
>>> On Thu, Oct 20, 2022 at 09:57:04AM +0800, Shuai Xue wrote:
在 2022/10/20 AM1:08, Tony Luck 写道:
>
>>> I'm experimenting with using sched_
>> When we do return to user mode the task is going to be busy servicing
>> a SIGBUS ... so shouldn't try to touch the poison page before the
>> memory_failure() called by the worker thread cleans things up.
>
> What about an RT process on a busy system?
> The worker threads are pretty low priority
From: Tony Luck
> Sent: 21 October 2022 05:08
> When we do return to user mode the task is going to be busy servicing
> a SIGBUS ... so shouldn't try to touch the poison page before the
> memory_failure() called by the worker thread cleans things up.
What about an RT process on a busy system?
On Fri, Oct 21, 2022 at 09:52:01AM +0800, Shuai Xue wrote:
>
>
> 在 2022/10/21 AM4:05, Tony Luck 写道:
> > On Thu, Oct 20, 2022 at 09:57:04AM +0800, Shuai Xue wrote:
> >>
> >>
> >> 在 2022/10/20 AM1:08, Tony Luck 写道:
> > I'm experimenting with using sched_work() to handle the call to
> > memory_fail
>> +INIT_WORK(&p->work, do_sched_memory_failure);
>> +p->pfn = pfn;
>> +schedule_work(&p->work);
>
> There is already memory_failure_queue() that can do this. Can we use it
> directly?
Miaohe Lin,
Yes, can use that. A thousand thanks for pointing it out. I just tried it, and
it work
在 2022/10/21 AM4:05, Tony Luck 写道:
> On Thu, Oct 20, 2022 at 09:57:04AM +0800, Shuai Xue wrote:
>>
>>
>> 在 2022/10/20 AM1:08, Tony Luck 写道:
>>> If the kernel is copying a page as the result of a copy-on-write
>>> fault and runs into an uncorrectable error, Linux will crash because
>>> it does no
On 2022/10/21 4:05, Tony Luck wrote:
> On Thu, Oct 20, 2022 at 09:57:04AM +0800, Shuai Xue wrote:
>>
>>
>> 在 2022/10/20 AM1:08, Tony Luck 写道:
>>> If the kernel is copying a page as the result of a copy-on-write
>>> fault and runs into an uncorrectable error, Linux will crash because
>>> it does not
On Thu, Oct 20, 2022 at 09:57:04AM +0800, Shuai Xue wrote:
>
>
> 在 2022/10/20 AM1:08, Tony Luck 写道:
> > If the kernel is copying a page as the result of a copy-on-write
> > fault and runs into an uncorrectable error, Linux will crash because
> > it does not have recovery code for this case where
在 2022/10/20 AM1:08, Tony Luck 写道:
> If the kernel is copying a page as the result of a copy-on-write
> fault and runs into an uncorrectable error, Linux will crash because
> it does not have recovery code for this case where poison is consumed
> by the kernel.
>
> It is easy to set up a test c
> Given there is no use case for the residue value returned by
> copy_mc_to_kernel() perhaps just return EHWPOISON directly from
> copyuser_highpage_mc() in the short-copy case?
I don't think it hurts to keep the return value as residue count. It isn't
making that code any more complex and could b
Tony Luck wrote:
> If the kernel is copying a page as the result of a copy-on-write
> fault and runs into an uncorrectable error, Linux will crash because
> it does not have recovery code for this case where poison is consumed
> by the kernel.
>
> It is easy to set up a test case. Just inject an e
14 matches
Mail list logo