On 06/02, Linus Torvalds wrote:
>
> On Fri, Jun 2, 2023 at 1:59 PM Oleg Nesterov wrote:
> >
> > As I said from the very beginning, this code is fine on x86 because
> > atomic ops are fully serialised on x86.
>
> Yes. Other architectures require __smp_mb__{before,after}_atomic for
> the bit
On Fri, Jun 2, 2023 at 1:59 PM Oleg Nesterov wrote:
>
> As I said from the very beginning, this code is fine on x86 because
> atomic ops are fully serialised on x86.
Yes. Other architectures require __smp_mb__{before,after}_atomic for
the bit setting ops to actually be memory barriers.
We
On 06/02, Jason Wang wrote:
>
> On Thu, Jun 1, 2023 at 3:43 PM Oleg Nesterov wrote:
> >
> > and the final rewrite:
> >
> > if (work->node) {
> > work_next = work->node->next;
> > if (true)
> > clear_bit(>flags);
> > }
> >
> >
On Thu, Jun 1, 2023 at 3:43 PM Oleg Nesterov wrote:
>
> On 06/01, Jason Wang wrote:
> >
> > On Wed, May 31, 2023 at 5:14 PM Oleg Nesterov wrote:
> > >
> > > > > I don't understand you. OK, to simplify, suppose we have 2 global vars
> > > > >
> > > > > void *PTR = something_non_null;
> >
On 06/01, Jason Wang wrote:
>
> On Wed, May 31, 2023 at 5:14 PM Oleg Nesterov wrote:
> >
> > > > I don't understand you. OK, to simplify, suppose we have 2 global vars
> > > >
> > > > void *PTR = something_non_null;
> > > > unsigned long FLAGS = -1ul;
> > > >
> > > > Now I think
On Wed, May 31, 2023 at 5:14 PM Oleg Nesterov wrote:
>
> On 05/31, Jason Wang wrote:
> >
> > On Wed, May 31, 2023 at 3:25 PM Oleg Nesterov wrote:
> > >
> > > On 05/31, Jason Wang wrote:
> > > >
> > > > 在 2023/5/23 20:15, Oleg Nesterov 写道:
> > > > >
> > > > > /* make sure flag is seen
On 05/31, Jason Wang wrote:
>
> On Wed, May 31, 2023 at 3:25 PM Oleg Nesterov wrote:
> >
> > On 05/31, Jason Wang wrote:
> > >
> > > 在 2023/5/23 20:15, Oleg Nesterov 写道:
> > > >
> > > > /* make sure flag is seen after deletion */
> > > > smp_wmb();
> > > >
On Wed, May 31, 2023 at 3:25 PM Oleg Nesterov wrote:
>
> On 05/31, Jason Wang wrote:
> >
> > 在 2023/5/23 20:15, Oleg Nesterov 写道:
> > >
> > > /* make sure flag is seen after deletion */
> > > smp_wmb();
> > > llist_for_each_entry_safe(work, work_next, node,
On 05/31, Jason Wang wrote:
>
> 在 2023/5/23 20:15, Oleg Nesterov 写道:
> >
> > /* make sure flag is seen after deletion */
> > smp_wmb();
> > llist_for_each_entry_safe(work, work_next, node, node) {
> > clear_bit(VHOST_WORK_QUEUED, >flags);
> >
在 2023/5/23 20:15, Oleg Nesterov 写道:
On 05/22, Oleg Nesterov wrote:
Right now I think that "int dead" should die,
No, probably we shouldn't call get_signal() if we have already dequeued SIGKILL.
but let me think tomorrow.
May be something like this... I don't like it but I can't suggest
在 2023/5/23 23:57, Eric W. Biederman 写道:
Oleg Nesterov writes:
On 05/22, Oleg Nesterov wrote:
Right now I think that "int dead" should die,
No, probably we shouldn't call get_signal() if we have already
dequeued SIGKILL.
Very much agreed. It is one thing to add a patch to move do_exit
On 5/29/23 9:38 PM, Eric W. Biederman wrote:
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index b7cbd66f889e..f3ce0fa6edd7 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c
...
> static int vhost_task_fn(void *data)
> {
> struct vhost_task *vtsk = data;
> -
On 05/30, Christian Brauner wrote:
>
> On Mon, May 29, 2023 at 01:19:39PM +0200, Oleg Nesterov wrote:
> >
> > If we want CLONE_THREAD, I think vhost_worker() should exit after
> > get_signal()
> > returns SIGKILL. Perhaps it should "disable" vhost_work_queue() somehow and
>
> Yes, and that's what
On 5/29/23 9:38 PM, Eric W. Biederman wrote:
> Mike is there any chance you can test the change below?
Yeah, I'm firing up tests now.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
"Eric W. Biederman" writes:
> Linus Torvalds writes:
>
>> So I'd really like to finish this. Even if we end up with a hack or
>> two in signal handling that we can hopefully fix up later by having
>> vhost fix up some of its current assumptions.
>
>
> The real sticky widget for me is how to
michael.chris...@oracle.com writes:
> On 5/29/23 2:35 PM, Mike Christie wrote:
>>> Hmm... If we you CLONE_THREAD the exiting vhost_worker() will auto-reap
>>> itself,
>> Oh wait, are you saying that when we get auto-reaped then we would do the
>> last
>> fput and call the
michael.chris...@oracle.com writes:
> On 5/29/23 6:19 AM, Oleg Nesterov wrote:
>> On 05/27, Eric W. Biederman wrote:
>>>
>>> Looking forward I don't see not asking the worker threads to stop
>>> for the coredump right now causing any problems in the future.
>>> So I think we can use this to
On 5/29/23 2:35 PM, Mike Christie wrote:
>> Hmm... If we you CLONE_THREAD the exiting vhost_worker() will auto-reap
>> itself,
> Oh wait, are you saying that when we get auto-reaped then we would do the last
> fput and call the file_operations->release function right? We actually set
>
On 5/29/23 12:46 PM, Oleg Nesterov wrote:
> Mike, sorry, I don't understand your email.
>
> Just in case, let me remind I know nothing about drivers/vhost/
>
No problem. I get it. I don't know the signal/thread code so it's
one of those things where I'm also learning as I go.
> On 05/29,
On 5/29/23 12:54 PM, Oleg Nesterov wrote:
> On 05/29, Oleg Nesterov wrote:
>>
>> Mike, sorry, I don't understand your email.
>>
>> Just in case, let me remind I know nothing about drivers/vhost/
>
> And... this is slightly off-topic but you didn't answer my previous
> question and I am just
On 05/29, Oleg Nesterov wrote:
>
> Mike, sorry, I don't understand your email.
>
> Just in case, let me remind I know nothing about drivers/vhost/
And... this is slightly off-topic but you didn't answer my previous
question and I am just curious. Do you agree that, even if we forget
about
Mike, sorry, I don't understand your email.
Just in case, let me remind I know nothing about drivers/vhost/
On 05/29, michael.chris...@oracle.com wrote:
>
> On 5/29/23 6:19 AM, Oleg Nesterov wrote:
> > On 05/27, Eric W. Biederman wrote:
> >>
> >> Looking forward I don't see not asking the worker
On 5/29/23 6:19 AM, Oleg Nesterov wrote:
> If we want CLONE_THREAD, I think vhost_worker() should exit after get_signal()
> returns SIGKILL. Perhaps it should "disable" vhost_work_queue() somehow and
> flush the pending works on ->work_list before exit, I dunno. But imo it should
> not wait for
On 5/29/23 6:19 AM, Oleg Nesterov wrote:
> On 05/27, Eric W. Biederman wrote:
>>
>> Looking forward I don't see not asking the worker threads to stop
>> for the coredump right now causing any problems in the future.
>> So I think we can use this to resolve the coredump issue I spotted.
>
> But we
On 05/27, Eric W. Biederman wrote:
>
> Looking forward I don't see not asking the worker threads to stop
> for the coredump right now causing any problems in the future.
> So I think we can use this to resolve the coredump issue I spotted.
But we have almost the same problem with exec.
Execing
On 5/27/23 8:41 PM, Eric W. Biederman wrote:
> Mike Christie writes:
>
>> On 5/23/23 7:15 AM, Oleg Nesterov wrote:
>>>
>>> Now the main question. Whatever we do, SIGKILL/SIGSTOP/etc can come right
>>> before we call work->fn(). Is it "safe" to run this callback with
>>> signal_pending() or
Mike Christie writes:
> On 5/23/23 7:15 AM, Oleg Nesterov wrote:
>>
>> Now the main question. Whatever we do, SIGKILL/SIGSTOP/etc can come right
>> before we call work->fn(). Is it "safe" to run this callback with
>> signal_pending() or fatal_signal_pending() ?
>
> The questions before this one
On Sat, May 27, 2023 at 6:17 PM Eric W. Biederman wrote:
>
> It seems like a good approach for including in the -rc series.
> I think the change should look more like my change below.
I have no objections. My patch was a fairly "hack and slash" thing to
just disassociate the IO workers entirely
Linus Torvalds writes:
> On Sat, May 27, 2023 at 2:49 AM Eric W. Biederman
> wrote:
>>
>> The real sticky widget for me is how to handle one of these processes
>> coredumping. It really looks like it will result in a reliable hang.
>
> Well, if *that* is the main worry, I think that's trivial
On Sat, May 27, 2023 at 2:49 AM Eric W. Biederman wrote:
>
> The real sticky widget for me is how to handle one of these processes
> coredumping. It really looks like it will result in a reliable hang.
Well, if *that* is the main worry, I think that's trivial enough to deal with.
In
Linus Torvalds writes:
> So I'd really like to finish this. Even if we end up with a hack or
> two in signal handling that we can hopefully fix up later by having
> vhost fix up some of its current assumptions.
The real sticky widget for me is how to handle one of these processes
coredumping.
On Thu, May 25, 2023 at 8:30 AM Eric W. Biederman wrote:
>
> Basically with no patches that is where Linus's kernel is.
>
> User complained about the new thread showing up in ps.
Well, not only that, but it actively broke existing workflows for
managing things. Showing up in 'ps' wasn't just
On 5/23/23 7:15 AM, Oleg Nesterov wrote:
>
> Now the main question. Whatever we do, SIGKILL/SIGSTOP/etc can come right
> before we call work->fn(). Is it "safe" to run this callback with
> signal_pending() or fatal_signal_pending() ?
The questions before this one I'll leave for the core vhost
Oleg Nesterov writes:
> On 05/24, Eric W. Biederman wrote:
>>
>> Oleg Nesterov writes:
>>
>> > Yes, but probably SIGABRT/exit doesn't really differ from SIGKILL wrt
>> > vhost_worker().
>>
>> Actually I think it reveals that exiting with SIGABRT will cause
>> a deadlock.
>>
>> coredump_wait
On 05/24, Eric W. Biederman wrote:
>
> Oleg Nesterov writes:
>
> > Yes, but probably SIGABRT/exit doesn't really differ from SIGKILL wrt
> > vhost_worker().
>
> Actually I think it reveals that exiting with SIGABRT will cause
> a deadlock.
>
> coredump_wait will wait for all of the threads to
Oleg Nesterov writes:
> On 05/23, Eric W. Biederman wrote:
>>
>> I want to point out that we need to consider not just SIGKILL, but
>> SIGABRT that causes a coredump, as well as the process peforming
>> an ordinary exit(2). All of which will cause get_signal to return
>> SIGKILL in this
On 05/23, Eric W. Biederman wrote:
>
> I want to point out that we need to consider not just SIGKILL, but
> SIGABRT that causes a coredump, as well as the process peforming
> an ordinary exit(2). All of which will cause get_signal to return
> SIGKILL in this context.
Yes, but probably
Hey Oleg,
For all these questions below let me get back to you by tomorrow.
I need to confirm if something would be considered a regression by
the core vhost developers.
On 5/23/23 7:15 AM, Oleg Nesterov wrote:
> On 05/22, Oleg Nesterov wrote:
>>
>> Right now I think that "int dead" should die,
Oleg Nesterov writes:
> On 05/22, Oleg Nesterov wrote:
>>
>> Right now I think that "int dead" should die,
>
> No, probably we shouldn't call get_signal() if we have already
> dequeued SIGKILL.
Very much agreed. It is one thing to add a patch to move do_exit
out of get_signal. It is another
On 5/22/23 2:40 PM, Michael S. Tsirkin wrote:
>> return copy_process(NULL, 0, node, );
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 8050fe23c732..a0f00a078cbb 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2891,6 +2891,7 @@ bool get_signal(struct ksignal *ksig)
"Michael S. Tsirkin" writes:
> On Sun, May 21, 2023 at 09:51:24PM -0500, Mike Christie wrote:
>> When switching from kthreads to vhost_tasks two bugs were added:
>> 1. The vhost worker tasks's now show up as processes so scripts doing ps
>> or ps a would not incorrectly detect the vhost task as
On 05/22, Oleg Nesterov wrote:
>
> Right now I think that "int dead" should die,
No, probably we shouldn't call get_signal() if we have already dequeued SIGKILL.
> but let me think tomorrow.
May be something like this... I don't like it but I can't suggest anything
better
right now.
On Sun, May 21, 2023 at 09:51:24PM -0500, Mike Christie wrote:
> When switching from kthreads to vhost_tasks two bugs were added:
> 1. The vhost worker tasks's now show up as processes so scripts doing ps
> or ps a would not incorrectly detect the vhost task as another process.
> 2. kthreads
On 05/22, Mike Christie wrote:
>
> On 5/22/23 7:30 AM, Oleg Nesterov wrote:
> >> + /*
> >> + * When we get a SIGKILL our release function will
> >> + * be called. That will stop new IOs from being queued
> >> + * and check for
On 5/22/23 7:30 AM, Oleg Nesterov wrote:
> Confused, please help...
>
> On 05/21, Mike Christie wrote:
>>
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -338,6 +338,7 @@ static int vhost_worker(void *data)
>> struct vhost_worker *worker = data;
>> struct vhost_work
Confused, please help...
On 05/21, Mike Christie wrote:
>
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -338,6 +338,7 @@ static int vhost_worker(void *data)
> struct vhost_worker *worker = data;
> struct vhost_work *work, *work_next;
> struct llist_node *node;
When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing ps
or ps a would not incorrectly detect the vhost task as another process.
2. kthreads disabled freeze by setting PF_NOFREEZE, but vhost tasks's
didn't disable or
47 matches
Mail list logo