>>> On 27.01.13 at 16:50, Ingo Molnar <mi...@kernel.org> wrote:
> * Linus Torvalds <torva...@linux-foundation.org> wrote: > >> On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udkni...@gmail.com> wrote: >> > I get below warning every day with 3.7, >> > one or two times per day. >> > >> > [ 2235.186027] WARNING: at > /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 > default_send_IPI_mask_logical+0x2f/0xb8() >> > [ 2235.186030] Hardware name: Aspire 4741 >> > [ 2235.186032] empty IPI mask >> > [ 2235.186079] [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57 >> > [ 2235.186087] [<c1053453>] smp_call_function_many+0x191/0x1a9 >> > [ 2235.186097] [<c101e074>] native_flush_tlb_others+0x21/0x24 >> > [ 2235.186101] [<c101e0da>] flush_tlb_page+0x63/0x89 >> > [ 2235.186105] [<c101d360>] ptep_set_access_flags+0x20/0x26 >> > [ 2235.186111] [<c108fadd>] do_wp_page+0x234/0x502 >> > [ 2235.186121] [<c1090825>] handle_pte_fault+0x50d/0x54c >> > [ 2235.186148] [<c1090934>] handle_mm_fault+0xd0/0xe2 >> > [ 2235.186153] [<c12dd143>] __do_page_fault+0x411/0x42d >> > [ 2235.186166] [<c12dd167>] do_page_fault+0x8/0xa >> > [ 2235.186170] [<c12db31a>] error_code+0x5a/0x60 >> > >> > This patch fix it. >> > >> > This patch also fix some system hang problem: >> > If the data->cpumask been cleared after pass >> > >> > if (WARN_ONCE(!mask, "empty IPI mask")) >> > return; >> > then the problem 83d349f3 fix will happen again. >> >> Hmm. We have very consciously tried to avoid the extra copy, although >> I'm not entirely sure why (it might possibly hurt on the MAXSMP >> configuration). >> >> See for example commit 723aae25d5cd ("smp_call_function_many: handle >> concurrent clearing of mask") which fixed another version of this >> problem. >> >> But I do agree that it looks like the copy is required, simply because >> - as you say - once we've done the "list_add_rcu()" to add it to the >> queue, we can have (another) IPI to the target CPU that can now see it >> and clear the mask. >> >> So by the time we get to actually send the IPI, the mask might have >> been cleared by another IPI. So I do agree that your patch seems >> correct, but I really really want to run it by other people. >> >> Guys? Original patch on lkml. The other possible fix might be >> to take the &call_function.lock earlier in >> generic_smp_call_function_interrupt(), so that we can never >> clear the bit while somebody is adding entries to the list... >> But I think it very much tries to avoid that on purpose right >> now, with only the last CPU responding to that IPI taking the >> lock. >> >> So copying the IPI mask seems to be the reasonable approach. >> Comments? > > Agreed, looks correct to me as well - I've queued the fix up in > tip:x86/urgent. But the patch is obviously incomplete for the CPUMASK_OFFSTACK case, as the newly added cpumask_ipi member never gets its bit array allocated. Jan -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html