This was CC'ed to old kvm-devel address so I took the liberty of forwarding it.
----- Forwarded message from Linus Torvalds <[EMAIL PROTECTED]> ----- Date: Tue, 3 Jun 2008 09:26:05 -0700 (PDT) From: Linus Torvalds <[EMAIL PROTECTED]> To: Andrea Arcangeli <[EMAIL PROTECTED]> cc: Andrew Morton <[EMAIL PROTECTED]>, Christoph Lameter <[EMAIL PROTECTED]>, Jack Steiner <[EMAIL PROTECTED]>, Robin Holt <[EMAIL PROTECTED]>, Nick Piggin <[EMAIL PROTECTED]>, Peter Zijlstra <[EMAIL PROTECTED]>, [EMAIL PROTECTED], Kanoj Sarcar <[EMAIL PROTECTED]>, Roland Dreier <[EMAIL PROTECTED]>, Steve Wise <[EMAIL PROTECTED]>, [EMAIL PROTECTED], Avi Kivity <[EMAIL PROTECTED]>, [EMAIL PROTECTED], [EMAIL PROTECTED], Hugh Dickins <[EMAIL PROTECTED]>, Rusty Russell <[EMAIL PROTECTED]>, Anthony Liguori <[EMAIL PROTECTED]>, Chris Wright <[EMAIL PROTECTED]>, Marcelo Tosatti <[EMAIL PROTECTED]>, Eric Dumazet <[EMAIL PROTECTED]>, "Paul E. McKenney" <[EMAIL PROTECTED]> Subject: Re: [PATCH 001/001] mmu-notifier-core v17 User-Agent: Alpine 1.10 (LFD 962 2008-03-14) On Fri, 9 May 2008, Andrea Arcangeli wrote: > > At least for KVM without this patch it's impossible to swap guests > reliably. And having this feature and removing the page pin allows > several other optimizations that simplify life considerably. Ok, this looks ok as far as I'm concerned. I did not look at any details, so obviously other VM people need to ack the parts they care about, but at least I think this one is fine from a "big picture". I do have some small nits that are just about trivial stuff. > 1) Introduces list_del_init_rcu and documents it (fixes a comment for > list_del_rcu too) I think this should go in separately, and be split up into a patch of its own, just because it's really an independent area. So make it [1/3]. > 2) mm_take_all_locks() to register the mmu notifier when the whole VM > isn't doing anything with "mm". This allows mmu notifier users to > keep track if the VM is in the middle of the > invalidate_range_begin/end critical section with an atomic counter > incraese in range_begin and decreased in range_end. Similarly, even without any users, I think this can be posted as an independent patch, just for setting things up, and to make the whole thing easier to look through and review. So make this [2/3]. But before doing that, can you split up the low-level single-vma anon/file locking/unlocking, please? In other words, your 'mm_take_all_locks()' rigth now looks like it _works_ correctly, but it nests too deeply considering the complexity of it. There's really subtle things going on inside that for-loop, and I think it would be much better to split those low-level locking rules out. IOW, instead of: > +int mm_take_all_locks(struct mm_struct *mm) > +{ > + struct vm_area_struct *vma; > + int ret = -EINTR; > + > + BUG_ON(down_read_trylock(&mm->mmap_sem)); > + > + mutex_lock(&mm_all_locks_mutex); > + > + for (vma = mm->mmap; vma; vma = vma->vm_next) { > + struct file *filp; > + if (signal_pending(current)) > + goto out_unlock; > + if (vma->anon_vma && !test_bit(0, (unsigned long *) > + &vma->anon_vma->head.next)) { > + /* > + * The LSB of head.next can't change from > + * under us because we hold the > + * global_mm_spinlock. > + */ > + spin_lock(&vma->anon_vma->lock); ... ie, can you please make it be for (vma = mm->mmap; vma; vma = vma->vm_next) { if (signal_pending(current)) goto out_unlock; if (vma->anon_vma) vm_lock_anon_vma(vma->anon_vma); if (vma->vm_file && vma->vm_file->f_mapping) vm_lock_mapping(vma->vm_file->f_mapping); } and the same thing for unlocking.. Doesn't that look more obvious and easier to understand from a high-level standpoing (and then the individual locking rules for mappings/anon_vma's will also be more obvious, just because they are separated from the higher-level code). The comments are fine, but even with the comments I'd prefer you to write the code so that you don't need to break up the conditionals over multiple lines etc. Anyway - I didn't look very much at the actual _notifier_ stuff (ie the thing that I think should be [patch 3/3]), so I don't have any real comments about that part - but I don't really care either. Becasue as long as it doesn't mess up the core VM logic, I no longer have any real objections. I'd obviously want to see ack's by people like Andrew, Hugh and Nick, but as far as I am concerned, if you just do the trivial cleanup/split, you can add an "Acked-by: Linus Torvalds <[EMAIL PROTECTED]>" to at least the two first patches of the split-up series. Linus ----- End forwarded message ----- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html