>>> Hi Simon,
>>>
>>> If we use "/sys/devices/system/memory/soft_offline_page" to offline a
>>> free page, the value of mce_bad_pages will be added. Then the page is marked
>>> HWPoison, but it is still managed by page buddy alocator.
>>>
>>> So if we offline it again, the value of mce_bad_pages w
On 2012/12/11 11:48, Simon Jeons wrote:
> On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote:
>> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote:
>>> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> Oh, it will be putback to lru list during migration. So does your "some
On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote:
> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote:
> > On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> > > > Oh, it will be putback to lru list during migration. So does your "some
> > > > time" mean before call check_new_page?
> "There are not so many free pages in a typical server system", sorry I don't
> quite understand it.
Linux tries to keep most memory in caches. As Linus says "free memory is
bad memory"
>
> buffered_rmqueue()
> prep_new_page()
> check_new_page()
> bad_pa
On 2012/12/11 10:58, Andi Kleen wrote:
>> That sounds like overkill. There are not so many free pages in a
>> typical server system.
>
> As Fengguang said -- memory error handling is tricky. Lots of things
> could be done in theory, but they all have a cost in testing and
> maintenance.
>
> In
On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote:
> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> > > Oh, it will be putback to lru list during migration. So does your "some
> > > time" mean before call check_new_page?
> >
> > Yes until the next check_new_page() whenever that i
On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote:
> > Oh, it will be putback to lru list during migration. So does your "some
> > time" mean before call check_new_page?
>
> Yes until the next check_new_page() whenever that is. If the migration
> works it will be earlier, otherwise later.
But I
> Oh, it will be putback to lru list during migration. So does your "some
> time" mean before call check_new_page?
Yes until the next check_new_page() whenever that is. If the migration
works it will be earlier, otherwise later.
-andi
--
To unsubscribe from this list: send the line "unsubscribe l
> That sounds like overkill. There are not so many free pages in a
> typical server system.
As Fengguang said -- memory error handling is tricky. Lots of things
could be done in theory, but they all have a cost in testing and
maintenance.
In general they are only worth doing if the situation is
On Tue, Dec 11, 2012 at 10:25:00AM +0800, Xishi Qiu wrote:
> On 2012/12/10 23:38, Andi Kleen wrote:
>
> >> It is another topic, I mean since the page is poisoned, so why not isolate
> >> it
> >> from page buddy alocator in soft_offline_page() rather than in
> >> check_new_page().
> >> I find sof
On 2012/12/10 23:38, Andi Kleen wrote:
>> It is another topic, I mean since the page is poisoned, so why not isolate it
>> from page buddy alocator in soft_offline_page() rather than in
>> check_new_page().
>> I find soft_offline_page() only migrate the page and mark HWPoison, the
>> poisoned
>>
On Tue, 2012-12-11 at 03:03 +0100, Andi Kleen wrote:
> > IIUC, soft offlining will isolate and migrate hwpoisoned page, and this
> > page will not be accessed by memory management subsystem until unpoison,
> > correct?
>
> No, soft offlining can still allow accesses for some time. It'll never kill
> IIUC, soft offlining will isolate and migrate hwpoisoned page, and this
> page will not be accessed by memory management subsystem until unpoison,
> correct?
No, soft offlining can still allow accesses for some time. It'll never kill
anything.
Hard tries much harder and will kill.
In some case
On Mon, 2012-12-10 at 16:38 +0100, Andi Kleen wrote:
> > It is another topic, I mean since the page is poisoned, so why not isolate
> > it
> > from page buddy alocator in soft_offline_page() rather than in
> > check_new_page().
> > I find soft_offline_page() only migrate the page and mark HWPoiso
> HWPoison delays any action on buddy allocator pages, handling can be safely
> postponed
> until a later time when the page might be referenced. By delaying, some
> transient errors
> may not reoccur or may be irrelevant.
That's not true for soft offlining, only for hard.
-Andi
--
a...@linu
> It is another topic, I mean since the page is poisoned, so why not isolate it
> from page buddy alocator in soft_offline_page() rather than in
> check_new_page().
> I find soft_offline_page() only migrate the page and mark HWPoison, the
> poisoned
> page is still managed by page buddy alocator.
Cc other guys.
On Mon, 2012-12-10 at 20:40 +0800, Xishi Qiu wrote:
> On 2012/12/10 19:56, Simon Jeons wrote:
>
> > On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote:
> >> On 2012/12/10 18:47, Simon Jeons wrote:
> >>
> >>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
> On 2012/12/10 1
On Mon, Dec 10, 2012 at 07:54:53PM +0800, Xishi Qiu wrote:
> One more question, can we add a list_head to manager the poisoned pages?
What would you need that list for? Also, a list is not the most optimal
data structure for when you need to traverse it often.
Thanks.
--
Regards/Gruss,
Bori
On Mon, 2012-12-10 at 19:16 +0800, Xishi Qiu wrote:
> On 2012/12/10 18:47, Simon Jeons wrote:
>
> > On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
> >> On 2012/12/10 16:33, Wanpeng Li wrote:
> >>
> >>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> On Fri, 7 Dec 2012 16
On 2012/12/10 19:39, Wanpeng Li wrote:
> On Mon, Dec 10, 2012 at 07:16:50PM +0800, Xishi Qiu wrote:
>> On 2012/12/10 18:47, Simon Jeons wrote:
>>
>>> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
On 2012/12/10 16:33, Wanpeng Li wrote:
> On Fri, Dec 07, 2012 at 02:11:02PM -0800,
On 2012/12/10 18:47, Simon Jeons wrote:
> On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
>> On 2012/12/10 16:33, Wanpeng Li wrote:
>>
>>> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
On Fri, 7 Dec 2012 16:48:45 +0800
Xishi Qiu wrote:
> On x86 platform, if
On Mon, 2012-12-10 at 17:06 +0800, Xishi Qiu wrote:
> On 2012/12/10 16:33, Wanpeng Li wrote:
>
> > On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> >> On Fri, 7 Dec 2012 16:48:45 +0800
> >> Xishi Qiu wrote:
> >>
> >>> On x86 platform, if we use "/sys/devices/system/memory/soft_off
On 2012/12/10 16:33, Wanpeng Li wrote:
> On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
>> On Fri, 7 Dec 2012 16:48:45 +0800
>> Xishi Qiu wrote:
>>
>>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page"
>>> to offline a
>>> free page twice, the value of mce
On 2012/12/8 6:11, Andrew Morton wrote:
> On Fri, 7 Dec 2012 16:48:45 +0800
> Xishi Qiu wrote:
>
>> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to
>> offline a
>> free page twice, the value of mce_bad_pages will be added twice. So this is
>> an error,
>> since the
On Fri, Dec 07, 2012 at 02:11:02PM -0800, Andrew Morton wrote:
> A few things:
>
> - soft_offline_page() already checks for this case:
>
> if (PageHWPoison(page)) {
> unlock_page(page);
> put_page(page);
> pr_info("soft offline: %#lx page already po
On Fri, 7 Dec 2012 16:48:45 +0800
Xishi Qiu wrote:
> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to
> offline a
> free page twice, the value of mce_bad_pages will be added twice. So this is
> an error,
> since the page was already marked HWPoison, we should skip th
On Fri, Dec 07, 2012 at 04:48:45PM +0800, Xishi Qiu wrote:
> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to
> offline a
> free page twice, the value of mce_bad_pages will be added twice. So this is
> an error,
> since the page was already marked HWPoison, we should s
On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to
offline a
free page twice, the value of mce_bad_pages will be added twice. So this is an
error,
since the page was already marked HWPoison, we should skip the page and don't
add the
value of mce_bad_pages.
$ cat /proc/
28 matches
Mail list logo