Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-14 Thread Oleg Nesterov
On 06/11, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > Could you spell to explain why this can't work (again, in this simple case) > > ? > > > > My current (and I know, very poor) understanding is that .release() should > > roughly

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-06 Thread Oleg Nesterov
On 06/06, Mike Christie wrote: > > On 6/6/23 7:16 AM, Oleg Nesterov wrote: > > On 06/05, Mike Christie wrote: > > > >> So it works like if we were using a kthread still: > >> > >> 1. Userapce thread0 opens /dev/vhost-$something. > >

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-06 Thread Oleg Nesterov
On 06/05, Mike Christie wrote: > > On 6/5/23 10:10 AM, Oleg Nesterov wrote: > > On 06/03, michael.chris...@oracle.com wrote: > >> > >> On 6/2/23 11:15 PM, Eric W. Biederman wrote: > >> The problem is that as part of the flush the drivers/vhost/scsi.c code

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/03, michael.chris...@oracle.com wrote: > > On 6/2/23 11:15 PM, Eric W. Biederman wrote: > The problem is that as part of the flush the drivers/vhost/scsi.c code > will wait for outstanding commands, because we can't free the device and > it's resources before the commands complete or we will

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/02, Linus Torvalds wrote: > > On Fri, Jun 2, 2023 at 1:59 PM Oleg Nesterov wrote: > > > > As I said from the very beginning, this code is fine on x86 because > > atomic ops are fully serialised on x86. > > Yes. Other architectures require __smp_mb__{befo

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/01, Mike Christie wrote: > > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -1368,7 +1368,9 @@ int zap_other_threads(struct task_struct *p) > > while_each_thread(p, t) { > task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); > - count++; > + /*

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/02, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > Hi Mike, > > > > sorry, but somehow I can't understand this patch... > > > > I'll try to read it with a fresh head on Weekend, but for example, > > > > On 06/01, Mike C

Re: [CFT][PATCH v3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-05 Thread Oleg Nesterov
On 06/02, Eric W. Biederman wrote: > > static int vhost_task_fn(void *data) > { > struct vhost_task *vtsk = data; > - int ret; > + bool dead = false; > + > + for (;;) { > + bool did_work; > + > + if (!dead && signal_pending(current)) { > +

Re: [PATCH 1/1] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Oleg Nesterov
Hi Mike, sorry, but somehow I can't understand this patch... I'll try to read it with a fresh head on Weekend, but for example, On 06/01, Mike Christie wrote: > > static int vhost_task_fn(void *data) > { > struct vhost_task *vtsk = data; > - int ret; > + bool dead = false; > + >

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-02 Thread Oleg Nesterov
On 06/02, Jason Wang wrote: > > On Thu, Jun 1, 2023 at 3:43 PM Oleg Nesterov wrote: > > > > and the final rewrite: > > > > if (work->node) { > > work_next = work->node->next; > > if (true)

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-06-01 Thread Oleg Nesterov
On 06/01, Jason Wang wrote: > > On Wed, May 31, 2023 at 5:14 PM Oleg Nesterov wrote: > > > > > > I don't understand you. OK, to simplify, suppose we have 2 global vars > > > > > > > > void *PTR = something_non_null; > > > >

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-31 Thread Oleg Nesterov
On 05/31, Jason Wang wrote: > > On Wed, May 31, 2023 at 3:25 PM Oleg Nesterov wrote: > > > > On 05/31, Jason Wang wrote: > > > > > > 在 2023/5/23 20:15, Oleg Nesterov 写道: > > > > > > > > /* make sur

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-31 Thread Oleg Nesterov
On 05/31, Jason Wang wrote: > > 在 2023/5/23 20:15, Oleg Nesterov 写道: > > > > /* make sure flag is seen after deletion */ > > smp_wmb(); > > llist_for_each_entry_safe(work, work_next, node, node) { > > clear

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-30 Thread Oleg Nesterov
On 05/30, Christian Brauner wrote: > > On Mon, May 29, 2023 at 01:19:39PM +0200, Oleg Nesterov wrote: > > > > If we want CLONE_THREAD, I think vhost_worker() should exit after > > get_signal() > > returns SIGKILL. Perhaps it should "disable" vhost_work_queu

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-29 Thread Oleg Nesterov
On 05/29, Oleg Nesterov wrote: > > Mike, sorry, I don't understand your email. > > Just in case, let me remind I know nothing about drivers/vhost/ And... this is slightly off-topic but you didn't answer my previous question and I am just curious. Do you agree that, e

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-29 Thread Oleg Nesterov
Mike, sorry, I don't understand your email. Just in case, let me remind I know nothing about drivers/vhost/ On 05/29, michael.chris...@oracle.com wrote: > > On 5/29/23 6:19 AM, Oleg Nesterov wrote: > > On 05/27, Eric W. Biederman wrote: > >> > >> Looking forwar

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-29 Thread Oleg Nesterov
On 05/27, Eric W. Biederman wrote: > > Looking forward I don't see not asking the worker threads to stop > for the coredump right now causing any problems in the future. > So I think we can use this to resolve the coredump issue I spotted. But we have almost the same problem with exec. Execing th

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-25 Thread Oleg Nesterov
On 05/24, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > Yes, but probably SIGABRT/exit doesn't really differ from SIGKILL wrt > > vhost_worker(). > > Actually I think it reveals that exiting with SIGABRT will cause > a deadlock. > > coredump

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-24 Thread Oleg Nesterov
On 05/23, Eric W. Biederman wrote: > > I want to point out that we need to consider not just SIGKILL, but > SIGABRT that causes a coredump, as well as the process peforming > an ordinary exit(2). All of which will cause get_signal to return > SIGKILL in this context. Yes, but probably SIGABRT/exi

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-23 Thread Oleg Nesterov
On 05/22, Oleg Nesterov wrote: > > Right now I think that "int dead" should die, No, probably we shouldn't call get_signal() if we have already dequeued SIGKILL. > but let me think tomorrow. May be something like this... I don't like it but I can't

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-22 Thread Oleg Nesterov
On 05/22, Mike Christie wrote: > > On 5/22/23 7:30 AM, Oleg Nesterov wrote: > >> + /* > >> + * When we get a SIGKILL our release function will > >> + * be called. That will stop new IOs from being queued > &g

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-22 Thread Oleg Nesterov
On 05/18, Eric W. Biederman wrote: > > void recalc_sigpending(void) > { > - if (!recalc_sigpending_tsk(current) && !freezing(current)) > + if ((!recalc_sigpending_tsk(current) && !freezing(current)) || > + ((current->signal->flags & SIGNAL_GROUP_EXIT) && > +

Re: [PATCH 3/3] fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

2023-05-22 Thread Oleg Nesterov
Confused, please help... On 05/21, Mike Christie wrote: > > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -338,6 +338,7 @@ static int vhost_worker(void *data) > struct vhost_worker *worker = data; > struct vhost_work *work, *work_next; > struct llist_node *node;

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Oleg Nesterov
On 05/18, Christian Brauner wrote: > > On Thu, May 18, 2023 at 08:08:10PM +0200, Oleg Nesterov wrote: > > On 05/18, Christian Brauner wrote: > > > > > > Yeah, but these are issues that exist with PF_IO_WORKER then too > > > > This was my thought

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Oleg Nesterov
On 05/18, Christian Brauner wrote: > > Yeah, but these are issues that exist with PF_IO_WORKER then too This was my thought too but I am starting to think I was wrong. Of course I don't understand the code in io_uring/ but it seems that it always breaks the IO loops if get_signal() returns SIGKIL

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Oleg Nesterov
On 05/18, Mike Christie wrote: > > On 5/18/23 11:25 AM, Oleg Nesterov wrote: > > I too do not understand the 1st change in this patch ... > > > > On 05/18, Mike Christie wrote: > >> > >> In the other patches we do: > >> > >> if (ge

Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

2023-05-18 Thread Oleg Nesterov
I too do not understand the 1st change in this patch ... On 05/18, Mike Christie wrote: > > In the other patches we do: > > if (get_signal(ksig)) > start_exit_cleanup_by_stopping_newIO() > flush running IO() > exit() > > But to do the flush running IO() part of this I need to wai

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-17 Thread Oleg Nesterov
On 05/16, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > >> There is this bit in complete_signal when SIGKILL is delivered to any > >> thread in the process. > >> > >>t = p; > >>do { >

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-16 Thread Oleg Nesterov
On 05/16, Eric W. Biederman wrote: > > A kernel thread can block SIGKILL and that is supported. > > For a thread that is part of a process you can't block SIGKILL when the > task is part of a user mode process. Or SIGSTOP. Another thread can call do_signal_stop()->signal_wake_up/etc. > There is t

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-16 Thread Oleg Nesterov
On 05/15, Mike Christie wrote: > > --- a/kernel/vhost_task.c > +++ b/kernel/vhost_task.c > @@ -75,13 +75,13 @@ struct vhost_task *vhost_task_create(int (*fn)(void *), > void *arg, >const char *name) > { > struct kernel_clone_args args = { > -

Re: [PATCH v11 8/8] vhost: use vhost_tasks for worker threads

2023-05-16 Thread Oleg Nesterov
On 05/15, Mike Christie wrote: > > Oleg and Christian, > > > Below is an updated patch that doesn't check for PF_USER_WORKER in the > signal.c code and instead will check for if we have blocked the signal. Looks like I need to read the whole series... will try tomorrow. > --- a/kernel/fork.c > ++

Re: [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Oleg Nesterov
On 02/15, Raghavendra K T wrote: > > * Raghavendra K T [2015-02-15 11:25:44]: > > Resending the V5 with smp_mb__after_atomic() change without bumping up > revision Reviewed-by: Oleg Nesterov Of course, this needs the acks from maintainers. And I agree that SLOWPATH in .head

Re: [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Oleg Nesterov
Well, I regret I mentioned the lack of barrier after enter_slowpath ;) On 02/15, Raghavendra K T wrote: > > @@ -46,7 +46,8 @@ static __always_inline bool static_key_false(struct > static_key *key); > > static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) > { > - set_bit(0, (vol

Re: [PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Oleg Nesterov
On 02/15, Raghavendra K T wrote: > > On 02/13/2015 09:02 PM, Oleg Nesterov wrote: > >>> @@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock >>> *lock, __ticket_t want) >>> * check again make sure it didn't become

Re: [PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-13 Thread Oleg Nesterov
On 02/13, Oleg Nesterov wrote: > > On 02/13, Raghavendra K T wrote: > > > > @@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock > > *lock, __ticket_t want) > > * check again make sure it didn't become free while > > * w

Re: [PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-13 Thread Oleg Nesterov
On 02/13, Raghavendra K T wrote: > > @@ -164,7 +161,7 @@ static inline int arch_spin_is_locked(arch_spinlock_t > *lock) > { > struct __raw_tickets tmp = READ_ONCE(lock->tickets); > > - return tmp.tail != tmp.head; > + return tmp.tail != (tmp.head & ~TICKET_SLOWPATH_FLAG); > } Well

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
On 02/11, Jeremy Fitzhardinge wrote: > > On 02/11/2015 09:24 AM, Oleg Nesterov wrote: > > I agree, and I have to admit I am not sure I fully understand why > > unlock uses the locked add. Except we need a barrier to avoid the race > > with the enter_slowpath() users, of cou

Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
Damn, sorry for noise, forgot to mention... On 02/12, Raghavendra K T wrote: > > +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock, > + __ticket_t head) > +{ > + if (head & TICKET_SLOWPATH_FLAG) { > + arc

Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
On 02/12, Raghavendra K T wrote: > > @@ -191,8 +189,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t > *lock) >* We need to check "unlocked" in a loop, tmp.head == head >* can be false positive because of overflow. >*/ > - if

Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Oleg Nesterov
On 02/12, Raghavendra K T wrote: > > @@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock > *lock, __ticket_t want) >* check again make sure it didn't become free while >* we weren't looking. >*/ > - if (ACCESS_ONCE(lock->tickets.head) == want) { >

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Oleg Nesterov
On 02/11, Raghavendra K T wrote: > > On 02/10/2015 06:56 PM, Oleg Nesterov wrote: > >> In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg >> the whole .head_tail. Plus obviously more boring changes. This needs a >> separate patch even _if_ this

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Oleg Nesterov
On 02/10, Jeremy Fitzhardinge wrote: > > On 02/10/2015 05:26 AM, Oleg Nesterov wrote: > > On 02/10, Raghavendra K T wrote: > >> Unfortunately xadd could result in head overflow as tail is high. > >> > >> The other option was repeated cmpxchg which is bad I be

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Oleg Nesterov
On 02/10, Denys Vlasenko wrote: > > # define HEAD_MASK (TICKET_SLOWPATH_FLAG-1) > > ... > unlock_again: > > val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC); > if (unlikely(!(val & HEAD_MASK))) { > /* overflow. we inadvertently incremented the tail word. > * tail's lsb is TICKET_SLOWP

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Oleg Nesterov
On 02/10, Raghavendra K T wrote: > > On 02/10/2015 06:23 AM, Linus Torvalds wrote: > >> add_smp(&lock->tickets.head, TICKET_LOCK_INC); >> if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) .. >> >> into something like >> >> val = xadd((&lock->ticket.head_tail, TICK

Re: [PATCH V2] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Oleg Nesterov
On 02/09, Raghavendra K T wrote: > > +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) > +{ > + arch_spinlock_t old, new; > + __ticket_t diff; > + > + old.tickets = READ_ONCE(lock->tickets); > + diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.ticke

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Oleg Nesterov
On 02/06, Sasha Levin wrote: > > Can we modify it slightly to avoid potentially accessing invalid memory: > > diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h > index 5315887..cd22d73 100644 > --- a/arch/x86/include/asm/spinlock.h > +++ b/arch/x86/include/asm/spinlock

Re: [PATCH v5 2/8] qspinlock, x86: Enable x86-64 to use queue spinlock

2014-03-02 Thread Oleg Nesterov
On 02/26, Waiman Long wrote: > > +#define _ARCH_SUPPORTS_ATOMIC_8_16_BITS_OPS > + > +/* > + * x86-64 specific queue spinlock union structure > + */ > +union arch_qspinlock { > + struct qspinlock slock; > + u8 lock; /* Lock bit */ > +}; And this enables the optimized vers

Re: [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks

2014-03-02 Thread Oleg Nesterov
On 02/26, Waiman Long wrote: > > @@ -144,7 +317,7 @@ static __always_inline int queue_spin_setlock(struct > qspinlock *lock) > int qlcode = atomic_read(lock->qlcode); > > if (!(qlcode & _QSPINLOCK_LOCKED) && (atomic_cmpxchg(&lock->qlcode, > - qlcode, qlcode|_QSPINLOCK_LOC

Re: [PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-03-02 Thread Oleg Nesterov
Forgot to ask... On 02/26, Waiman Long wrote: > > +notify_next: > + /* > + * Wait, if needed, until the next one in queue set up the next field > + */ > + while (!(next = ACCESS_ONCE(node->next))) > + arch_mutex_cpu_relax(); > + /* > + * The next one in queue

Re: [PATCH v5 1/8] qspinlock: Introducing a 4-byte queue spinlock implementation

2014-03-02 Thread Oleg Nesterov
On 02/26, Waiman Long wrote: > > +void queue_spin_lock_slowpath(struct qspinlock *lock, int qsval) > +{ > + unsigned int cpu_nr, qn_idx; > + struct qnode *node, *next; > + u32 prev_qcode, my_qcode; > + > + /* > + * Get the queue node > + */ > + cpu_nr = smp_processor_i