>>   Under normal circumstances,When do_exit exits, mm->owner will
>>   be updated on exit_mm(). but when the kernel process calls
>>   unuse_mm() and then exits,mm->owner cannot be updated. And it
>>   will point to a task that has been released.
>>
>>   Below is my issue on vhost_net:
>>      A, B are two kernel processes(such as vhost_worker),
>>      C is a user space process(such as qemu), and all
>>      three use the mm of the user process C.
>>      Now, because user process C exits abnormally, the owner of this
>>      mm becomes A. When A calls unuse_mm and exits, this mm->ower
>>      still points to the A that has been released.
>>      When B accesses this mm->owner again, A has been released.


Thank your for taking a look and apologize for my distrub.

>Could you describe how you reproduce this issue?
Sorry, this issue is hard for my to reproduce, But there is such a critical 
situation.

>I believe vhost process should exit before process C?
Yes, the A, B will exit before C, because usually C will close the open fd and 
then exit.
However, if C is abnormally exited, such as killed by some fatal signal, A may 
exit before C

The current issue flow is as follows:
Process C              Process A         Process B
qemu-system-x86_64:     kernel:vhost_net  kernel: vhost_net
open /dev/vhost-net
  VHOST_SET_OWNER   create kthread vhost-%d  create kthread vhost-%d
  network init           use_mm()          use_mm()
   ...                   ...
   Abnormal exited
   ...
  do_exit
  exit_mm()
  update mm->owner to A
  exit_files()
   close_files()
   kthread_should_stop() unuse_mm()
    Stop Process A       tsk->mm=NULL
                         do_exit()
                         can't update owner
                         A exit completed   vhost-%d  rcv first package
                                            vhost-%d build rcv buffer for vq
                                            page fault
                                            access mm & mm->owner
                                            NOW,mm->owner still pointer A
                                            kernel NULL pointer at 
mem_cgroup_from_task()
    stop Process B

>>
>> Cc: "Michael S. Tsirkin" <m...@redhat.com>
>> Cc: Jason Wang <jasow...@redhat.com>
>> Cc: k...@vger.kernel.org
>> Cc: virtualizat...@lists.linux-foundation.org
>> Cc: net...@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: "Eric W. Biederman" <ebied...@xmission.com>
>> Cc: Andrew Morton <a...@linux-foundation.org>
>> Cc: Sudip Mukherjee <sudipm.mukher...@gmail.com>
>> Cc: "Luis R. Rodriguez" <mcg...@kernel.org>
>> Cc: Dominik Brodowski <li...@dominikbrodowski.net>
>> Signed-off-by: guomin chen <gchen.guo...@gmail.com>
>> ---
>>   drivers/vhost/vhost.c | 1 +
>>   kernel/exit.c         | 1 +
>>   2 files changed, 2 insertions(+)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 6b98d8e..7c09087 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -368,6 +368,7 @@ static int vhost_worker(void *data)
>>              }
>>      }
>>      unuse_mm(dev->mm);
>> +    mm_update_next_owner(dev->mm);


>If you analysis is correct, this is still racy isn't it? (E.g page fault 
>happen between unuse_mm() and mm_update_next_owner()).

No, I think this is not racy. 
When page fault happend Between unuse_mm() and mm_update_next_owner(), Although 
tsk->mm =NULL, 
But tsk has not exited, So mm->onwer = tsk can still be accessed.  

Thanks and regards

Reply via email to