Re: Adding plain accesses and detecting data races in the LKMM

2019-04-02 Thread Andrea Parri
e after > the rcu_dereference, because it doesn't take into account the address > dependency from the intermediate plain read. Hopefully we will add > such things to the memory model later on. Concentrating on data races > seems like enough for now. > > Some of the ideas

[PATCH v2] openvswitch: fix flow actions reallocation

2019-03-28 Thread Andrea Righi
the requested data. BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi --- Changes in v2: - correctly resize to current_size+req_size (thanks to Pravin) net/openvswitch/flow_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/openvswitch

Re: [PATCH -tip v3 04/10] x86/kprobes: Prohibit probing on IRQ handlers directly

2019-03-26 Thread Andrea Righi
reaks one of my tests (which I probe on do_IRQ). > > OK, it seems this patch is a bit redundant, because > I found that these interrupt handler issue has been fixed > by Andrea's commit before merge this patch. > > commit a50480cb6d61d5c5fc13308479407b628b6bc1c5 > Author: And

Re: [PATCH v2 1/1] userfaultfd/sysctl: add vm.unprivileged_userfaultfd

2019-03-21 Thread Andrea Arcangeli
Hello, On Thu, Mar 21, 2019 at 01:43:35PM +, Luis Chamberlain wrote: > On Wed, Mar 20, 2019 at 03:01:12PM -0400, Andrea Arcangeli wrote: > > but > > that would be better be achieved through SECCOMP and not globally.'. > > That begs the question why not use seccomp for t

Re: [PATCH v2 1/1] userfaultfd/sysctl: add vm.unprivileged_userfaultfd

2019-03-20 Thread Andrea Arcangeli
Hello, On Tue, Mar 19, 2019 at 06:28:23PM +, Dr. David Alan Gilbert wrote: > --- > Userfaultfd can be misued to make it easier to exploit existing use-after-free > (and similar) bugs that might otherwise only make a short window > or race condition available. By using userfaultfd to stall a

Re: [PATCH v2 1/1] userfaultfd/sysctl: add vm.unprivileged_userfaultfd

2019-03-20 Thread Andrea Arcangeli
ed users. When this is > > set to zero, only privileged users (root user, or users with the > > CAP_SYS_PTRACE capability) will be able to use the userfaultfd > > syscalls. > > > > Suggested-by: Andrea Arcangeli > > Suggested-by: Mike Rapoport > > Signed-of

Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm

2019-03-16 Thread Andrea Arcangeli
On Sat, Mar 16, 2019 at 05:38:54PM +0800, zhong jiang wrote: > On 2019/3/16 5:39, Andrea Arcangeli wrote: > > On Fri, Mar 08, 2019 at 03:10:08PM +0800, zhong jiang wrote: > >> I can reproduce the issue in arm64 qemu machine. The issue will leave > >> af

Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm

2019-03-15 Thread Andrea Arcangeli
On Fri, Mar 08, 2019 at 03:10:08PM +0800, zhong jiang wrote: > I can reproduce the issue in arm64 qemu machine. The issue will leave after > applying the > patch. > > Tested-by: zhong jiang Thanks a lot for the quick testing! > Meanwhile, I just has a little doubt whether it is necessary to

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-14 Thread Andrea Arcangeli
On Thu, Mar 14, 2019 at 11:58:15AM +0100, Paolo Bonzini wrote: > On 14/03/19 00:44, Andrea Arcangeli wrote: > > Then I thought we can add a tristate so an open of /dev/kvm would also > > allow the syscall to make things more user friendly because > > unprivileged container

[PATCH v2] btrfs: raid56: properly unmap parity page in finish_parity_scrub()

2019-03-14 Thread Andrea Righi
dev/sde - mount it: # mount /dev/sdb /mnt - run btrfs scrub in a loop: # while :; do btrfs scrub start -BR /mnt; done BugLink: https://bugs.launchpad.net/bugs/1812845 Reviewed-by: Johannes Thumshirn Signed-off-by: Andrea Righi --- Changes in v2: - added a better description about this

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-13 Thread Andrea Arcangeli
On Wed, Mar 13, 2019 at 01:01:40PM -0700, Mike Kravetz wrote: > On 3/13/19 11:52 AM, Andrea Arcangeli wrote: > > > > hugetlbfs is more complicated to detect, because even if you inherit > > it from fork(), the services that mounts the fs may be in a different >

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-13 Thread Andrea Arcangeli
h CRIU and KVM), I think Oracle > > will need a one liner change in the Oracle setup to echo into that > > file in addition of running the hugetlbfs mount. > > Hi Andrea, can you explain more in detail the risks of enabling > userfaultfd for unprivileged users? There's no more risk th

Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users

2019-03-13 Thread Andrea Arcangeli
CRIU and KVM), I think Oracle will need a one liner change in the Oracle setup to echo into that file in addition of running the hugetlbfs mount. Note that DPDK host bridge process will also need a one liner change to do a dummy open/close of /dev/kvm to unblock the syscall. Thanks, Andrea

Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

2019-03-12 Thread Andrea Arcangeli
archs to add the cache flushes on kunmap too, and then remove the cache flushes from the other places like copy_page or we'd waste CPU. Then you'd have the best of both words, no double flush and kunmap would be enough. Thanks, Andrea

[PATCH v4] blkcg: prevent priority inversion problem during sync()

2019-03-09 Thread Andrea Righi
/640 Signed-off-by: Andrea Righi --- Changes in v4: - fix a build bug when CONFIG_BLOCK is unset block/blk-cgroup.c | 130 +++ block/blk-throttle.c | 11 ++- fs/fs-writeback.c| 5 ++ fs/sync.c| 8

[PATCH v3] blkcg: prevent priority inversion problem during sync()

2019-03-08 Thread Andrea Righi
/640 Signed-off-by: Andrea Righi --- Changes in v3: - drop sync(2) isolation patches (this will be addressed by another patch, potentially operating at the fs namespace level) - use a per-bdi lock and a per-bdi list instead of a global lock and a global list to save the list of sync(2

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-08 Thread Andrea Arcangeli
no need of setting up any pagetables or to do any TLB flushes (except on 32bit archs if the page is above the direct mapping but it never happens on 64bit archs). Thanks, Andrea

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-08 Thread Andrea Arcangeli
On Fri, Mar 08, 2019 at 04:58:44PM +0800, Jason Wang wrote: > Can I simply can set_page_dirty() before vunmap() in the mmu notifier > callback, or is there any reason that it must be called within vumap()? I also don't see any problem in doing it before vunmap. As far as the mmu notifier and

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-08 Thread Andrea Arcangeli
tart at the latest in such case. My prefer is generally to call gup_fast() followed by immediate put_page() because I always want to drop FOLL_GET from gup_fast as a whole to avoid 2 useless atomic ops per gup_fast. I'll write more about vmap in answer to the other email. Thanks, Andrea

Re: [PATCH v2 0/3] blkcg: sync() isolation

2019-03-08 Thread Andrea Righi
On Fri, Mar 08, 2019 at 12:22:20PM -0500, Josef Bacik wrote: > On Thu, Mar 07, 2019 at 07:08:31PM +0100, Andrea Righi wrote: > > = Problem = > > > > When sync() is executed from a high-priority cgroup, the process is forced > > to > > wait the completion of the

Re: [PATCH v2 3/3] blkcg: implement sync() isolation

2019-03-07 Thread Andrea Righi
On Thu, Mar 07, 2019 at 05:07:01PM -0500, Josef Bacik wrote: > On Thu, Mar 07, 2019 at 07:08:34PM +0100, Andrea Righi wrote: > > Keep track of the inodes that have been dirtied by each blkcg cgroup and > > make sure that a blkcg issuing a sync() can trigger the writeback + wait &g

Re: [PATCH v2 1/3] blkcg: prevent priority inversion problem during sync()

2019-03-07 Thread Andrea Righi
On Thu, Mar 07, 2019 at 05:10:53PM -0500, Josef Bacik wrote: > On Thu, Mar 07, 2019 at 07:08:32PM +0100, Andrea Righi wrote: > > Prevent priority inversion problem when a high-priority blkcg issues a > > sync() and it is forced to wait the completion of all the writeback I/O >

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-07 Thread Andrea Arcangeli
ap that call set_page_dirty > on the page from the mmu notifier. Agreed, that will solve all issues in vhost context with regard to set_page_dirty, including the case the memory is backed by VM_SHARED ext4. Thanks! Andrea

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-07 Thread Andrea Arcangeli
long term GUP pins, so I'm asking... Thanks! Andrea

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-07 Thread Andrea Arcangeli
ide the page table lock. > > implying it's called just later. > > OK I missed the fact that _end actually calls > mmu_notifier_invalidate_range internally. So that part is fine but the > fact that you are trying to take page lock under VQ mutex and take same > mutex within notif

[PATCH v2 2/3] blkcg: introduce io.sync_isolation

2019-03-07 Thread Andrea Righi
only dirty pages that belong to the cgroup itself (except for the root cgroup that would still be able to write out all pages globally). Signed-off-by: Andrea Righi --- Documentation/admin-guide/cgroup-v2.rst | 9 ++ block/blk-throttle.c| 37

[PATCH v2 3/3] blkcg: implement sync() isolation

2019-03-07 Thread Andrea Righi
behavior is applied: sync() triggers the writeback of any dirty page. Signed-off-by: Andrea Righi --- block/blk-cgroup.c | 47 ++ fs/fs-writeback.c | 52 +++--- fs/inode.c | 1 + include/linux/blk

[PATCH v2 1/3] blkcg: prevent priority inversion problem during sync()

2019-03-07 Thread Andrea Righi
policy could be to adjust the throttling I/O rate using the blkcg with the highest speed from the list of waiters - priority inheritance, kinda). Signed-off-by: Andrea Righi --- block/blk-cgroup.c | 131 +++ block/blk-throttle.c | 11 ++- fs/fs

[PATCH v2 0/3] blkcg: sync() isolation

2019-03-07 Thread Andrea Righi
user 0m0,001s sys0m0,008s [ Time range goes from 0.7s to 1.6s ] Changes in v2: - fix: properly keep track of sync waiters when a blkcg is writing to many block devices at the same time Andrea Righi (3): blkcg: prevent priority inversion problem during sync() blkcg: introduce io.

Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm

2019-03-06 Thread Andrea Arcangeli
case when it gets a -ESRCH retval. Note that this fork feature is only ever needed in the non-cooperative case, these things never need to happen when userfaultfd is used by an app (or a lib) that is aware that it is using userfaultfd. Thanks, Andrea

Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm

2019-03-05 Thread Andrea Arcangeli
owever that mm is on its way to exit_mmap as soon as the ioclt returns and this only ever happens during race conditions, so the way CRIU monitor works there wasn't anything fundamentally concerning about this detail, despite it's remarkably "strange". Our priority was to keep the fork code

Re: [PATCH v2] mm/memory.c: do_fault: avoid usage of stale vm_area_struct

2019-03-02 Thread Andrea Arcangeli
rdered after up_read(mmap_sem) either. Other than the above detail: Reviewed-by: Andrea Arcangeli Thanks, Andrea

Re: [RFC PATCH] tools/memory-model: Remove (dep ; rfi) from ppo

2019-02-22 Thread Andrea Parri
warning or as a reference to those developers who are quivering to use (dep ; rfi): enjoy it, be careful. Andrea

[PATCH 2/2] ath9K: debugfs: Fix SPUR-DOWN field

2019-02-21 Thread Andrea Greco
From: Andrea Greco SPUR DOWN field return spurup inside of spurdown Signed-off-by: Andrea Greco --- drivers/net/wireless/ath/ath9k/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c index

[PATCH 1/2] nl80211: Allow change CW to Ad-Hock network

2019-02-21 Thread Andrea Greco
From: Andrea Greco Add net-link support for change CW in Ad-Hock network Signed-off-by: Andrea Greco --- net/wireless/nl80211.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index d91a408db113..4fcc63fa4380 100644

Re: [PATCH v3 1/7] dump_stack: Support adding to the dump stack arch description

2019-02-20 Thread Andrea Parri
mply that the string is dumped > in the panic path, and you never really know when you're going to panic. > Even if you only write to the string before doing SMP bringup you might > still have another CPU go rogue and panic before then. > > But I probably should have just not added the barrier, it's over > paranoid and will almost certainly never matter in practice. Oh, well, I can only echo you: if you don't care about the stores being _observed_ out of order, you could simply remove the barrier; if you do care, then you need "more paranoid" on the readers side. ;-) Andrea > > cheers

Re: [RFC PATCH] tools/memory-model: Remove (dep ; rfi) from ppo

2019-02-20 Thread Andrea Parri
On Wed, Feb 20, 2019 at 09:57:00AM +, Will Deacon wrote: > On Wed, Feb 20, 2019 at 10:26:04AM +0100, Peter Zijlstra wrote: > > On Tue, Feb 19, 2019 at 06:01:17PM -0800, Paul E. McKenney wrote: > > > On Tue, Feb 19, 2019 at 11:57:37PM +0100, Andrea Parri wrote: > >

Re: [RFC PATCH] tools/memory-model: Remove (dep ; rfi) from ppo

2019-02-20 Thread Andrea Parri
On Wed, Feb 20, 2019 at 10:26:04AM +0100, Peter Zijlstra wrote: > On Tue, Feb 19, 2019 at 06:01:17PM -0800, Paul E. McKenney wrote: > > On Tue, Feb 19, 2019 at 11:57:37PM +0100, Andrea Parri wrote: > > > Remove this subtle (and, AFAICT, unused) ordering: we can add it back,

Re: [PATCH v3 1/7] dump_stack: Support adding to the dump stack arch description

2019-02-19 Thread Andrea Parri
On Mon, Feb 11, 2019 at 03:38:59PM +0100, Petr Mladek wrote: > On Mon 2019-02-11 13:50:35, Andrea Parri wrote: > > Hi Michael, > > > > > > On Thu, Feb 07, 2019 at 11:46:29PM +1100, Michael Ellerman wrote: > > > Arch code can set a "dump stack arch des

[RFC PATCH] tools/memory-model: Remove (dep ; rfi) from ppo

2019-02-19 Thread Andrea Parri
ONCE(*x, 1); smp_store_release(y, 1); } P1(int *x, int *y, int *z) { int r0; int r1; int r2; r0 = READ_ONCE(*y); WRITE_ONCE(*z, r0); r1 = smp_load_acquire(z); r2 = READ_ONCE(*x); } exists (1:r0=1 /\ 1:r2=0) Signed-off-by: Andrea Parri Cc: Alan

[PATCH 2/2] tools/memory-model: Do not use "herd" to refer to "herd7"

2019-02-19 Thread Andrea Parri
Use "herd7" in each such reference. Signed-off-by: Andrea Parri Cc: Alan Stern Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Nicholas Piggin Cc: David Howells Cc: Jade Alglave Cc: Luc Maranget Cc: "Paul E. McKenney" Cc: Akira Yokosawa Cc: Daniel Lustig ---

[PATCH 1/2] tools/memory-model: Fix comment in MP+poonceonces.litmus

2019-02-19 Thread Andrea Parri
The comment should say "Sometimes" for the result. Signed-off-by: Andrea Parri Cc: Alan Stern Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Nicholas Piggin Cc: David Howells Cc: Jade Alglave Cc: Luc Maranget Cc: "Paul E. McKenney" Cc: Akira Yokosaw

[PATCH 0/2] tools/memory-model: Trivialities

2019-02-19 Thread Andrea Parri
Fixes to inline comments, documentation, script usage. Cc: Alan Stern Cc: Will Deacon Cc: Peter Zijlstra Cc: Boqun Feng Cc: Nicholas Piggin Cc: David Howells Cc: Jade Alglave Cc: Luc Maranget Cc: "Paul E. McKenney" Cc: Akira Yokosawa Cc: Daniel Lustig Andrea Parri (2): to

Re: [PATCH -mm -V8] mm, swap: fix race between swapoff and some swap operations

2019-02-19 Thread Andrea Parri
> Fixes: 235b62176712 ("mm/swap: add cluster lock") > Signed-off-by: "Huang, Ying" > Not-Nacked-by: Hugh Dickins > Cc: Paul E. McKenney > Cc: Minchan Kim > Cc: Johannes Weiner > Cc: Tim Chen > Cc: Mel Gorman > Cc: Jérôme Glisse > Cc: M

[PATCH 0/3] blkcg: sync() isolation

2019-02-19 Thread Andrea Righi
user 0m0,001s sys0m0,008s [ Time range goes from 0.7s to 1.6s ] Andrea Righi (3): blkcg: prevent priority inversion problem during sync() blkcg: introduce io.sync_isolation blkcg: implement sync() isolation Documentation/admin-guide/cgroup-v2.rst | 9 +++ block/blk-cg

[PATCH 3/3] blkcg: implement sync() isolation

2019-02-19 Thread Andrea Righi
behavior is applied: sync() triggers the writeback of any dirty page. Signed-off-by: Andrea Righi --- block/blk-cgroup.c | 47 ++ fs/fs-writeback.c | 52 +++--- fs/inode.c | 1 + include/linux/blk

[PATCH 1/3] blkcg: prevent priority inversion problem during sync()

2019-02-19 Thread Andrea Righi
policy could be to adjust the throttling I/O rate using the blkcg with the highest speed from the list of waiters - priority inheritance, kinda). Signed-off-by: Andrea Righi --- block/blk-cgroup.c | 73 block/blk-throttle.c | 11 +++-- fs/fs

[PATCH 2/3] blkcg: introduce io.sync_isolation

2019-02-19 Thread Andrea Righi
only dirty pages that belong to the cgroup itself (except for the root cgroup that would still be able to write out all pages globally). Signed-off-by: Andrea Righi --- Documentation/admin-guide/cgroup-v2.rst | 9 ++ block/blk-throttle.c| 37

Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting

2019-02-18 Thread Andrea Arcangeli
hat forcefully invokes MMU notifiers and forces host allocations and KVM page faults in order to reallocate the same RAM in the same guest. When there's memory pressure it's up to the host Linux VM to notice there's plenty of MADV_FREE material to free at zero I/O cost before starting swapping. Thanks, Andrea

Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory

2019-02-18 Thread Andrea Arcangeli
nchmark i could > run ? We could also try a microbenchmark based on ltp/testcases/kernel/mem/ksm/ksm02.c that already should trigger a merge flood and a COW flood during its internal processing. Thanks, Andrea

Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

2019-02-14 Thread Andrea Arcangeli
On Thu, Feb 14, 2019 at 04:07:37PM +0800, Huang, Ying wrote: > Before, we choose to use stop_machine() to reduce the overhead of hot > path (page fault handler) as much as possible. But now, I found > rcu_read_lock_sched() is just a wrapper of preempt_disable(). So maybe > we can switch to RCU

Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

2019-02-14 Thread Andrea Arcangeli
ill the idea of stop_machine would be to do those p->swap_map = NULL and everything protected by the swap_lock, should be executed inside the callback that runs like in a UP system to speedup the fast path further. Thanks, Andrea

Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

2019-02-13 Thread Andrea Arcangeli
ze_kernel or whatever it is called right now, but still RCU) solution isn't preferable. Thanks, Andrea

[tip:perf/core] kprobes: Prohibit probing on bsearch()

2019-02-13 Thread tip-bot for Andrea Righi
Commit-ID: 02106f883cd745523f7766d90a739f983f19e650 Gitweb: https://git.kernel.org/tip/02106f883cd745523f7766d90a739f983f19e650 Author: Andrea Righi AuthorDate: Wed, 13 Feb 2019 01:15:34 +0900 Committer: Ingo Molnar CommitDate: Wed, 13 Feb 2019 08:16:41 +0100 kprobes: Prohibit probing

Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

2019-02-12 Thread Andrea Parri
de API), so that here you could instead use preempt-disable + synchronize_rcu{,expedited}(). This LWN article gives an overview of the latest RCU API/semantics changes: https://lwn.net/Articles/777036/. Andrea

Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

2019-02-11 Thread Andrea Parri
the following LB-like pattern: CPU0CPU1 reads p->swap_map lock(completion) lock(completion)read completion->done completion->done++ unlock(completion) unlock(completion) p->swap_map = NULL where CPU0 must see a non-NULL p->swap_map if CPU1 sees the completion from CPU0. Does this make sense? Andrea

Re: [RFC PATCH v2] blkcg: prevent priority inversion problem during sync()

2019-02-11 Thread Andrea Righi
On Mon, Feb 11, 2019 at 10:39:34AM -0500, Josef Bacik wrote: > On Sat, Feb 09, 2019 at 03:07:49PM +0100, Andrea Righi wrote: > > This is an attempt to mitigate the priority inversion problem of a > > high-priority blkcg issuing a sync() and being forced to wait the > &

Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory

2019-02-11 Thread Andrea Arcangeli
On Mon, Feb 11, 2019 at 02:09:31PM -0500, Jerome Glisse wrote: > Yeah, between do you have any good workload for me to test this ? I > was thinking of running few same VM and having KSM work on them. Is > there some way to trigger KVM to fork ? As the other case is breaking > COW after fork. KVM

Re: [PATCH v3 1/7] dump_stack: Support adding to the dump stack arch description

2019-02-11 Thread Andrea Parri
previously), and so print the > + * uninitialised tail. But the whole string lives in BSS so in > + * practice it should just see NULLs. The comment doesn't say _why_ we need to order these stores: IOW, what will or can go wrong without this order? T

[RFC PATCH v2] blkcg: prevent priority inversion problem during sync()

2019-02-09 Thread Andrea Righi
with any definitive solution. This patch is not a definitive solution either, but it's an attempt to continue addressing this issue and handling the priority inversion problem with sync() in a better way. Signed-off-by: Andrea Righi --- Changes in v2: - fix: use the proper current blkcg

Re: [RFC PATCH] blkcg: prevent priority inversion problem during sync()

2019-02-09 Thread Andrea Righi
On Sat, Feb 09, 2019 at 01:06:33PM +0100, Andrea Righi wrote: ... > +/** > + * blkcg_wb_waiters_on_bdi - check for writeback waiters on a block device > + * @bdi: block device to check > + * > + * Return true if any other blkcg is waiting for writeback on the target > block &

[RFC PATCH] blkcg: prevent priority inversion problem during sync()

2019-02-09 Thread Andrea Righi
with any definitive solution. This patch is not a definitive solution either, but it's an attempt to continue addressing the issue and, hopefully, handle the priority inversion problem with sync() in a better way. Signed-off-by: Andrea Righi --- block/blk-cgroup.c | 69

[tip:sched/core] sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()

2019-02-04 Thread tip-bot for Andrea Parri
Commit-ID: c546951d9c9300065bad253ecdf1ac59ce9d06c8 Gitweb: https://git.kernel.org/tip/c546951d9c9300065bad253ecdf1ac59ce9d06c8 Author: Andrea Parri AuthorDate: Mon, 21 Jan 2019 16:52:40 +0100 Committer: Ingo Molnar CommitDate: Mon, 4 Feb 2019 09:13:21 +0100 sched/core: Use READ_ONCE

Re: [RFC PATCH 2/4] mm/mmu_notifier: use unsigned for event field in range struct

2019-02-01 Thread Andrea Arcangeli
On Thu, Jan 31, 2019 at 01:37:04PM -0500, Jerome Glisse wrote: > From: Jérôme Glisse > > Use unsigned for event field in range struct so that we can also set > flags with the event. This patch change the field and introduce the > helper. > > Signed-off-by: Jérôme Glisse &

Re: [RFC PATCH 1/4] uprobes: use set_pte_at() not set_pte_at_notify()

2019-02-01 Thread Andrea Arcangeli
On Thu, Jan 31, 2019 at 01:37:03PM -0500, Jerome Glisse wrote: > @@ -207,8 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, > unsigned long addr, > > flush_cache_page(vma, addr, pte_pfn(*pvmw.pte)); > ptep_clear_flush_notify(vma, addr, pvmw.pte); > -

Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory

2019-02-01 Thread Andrea Arcangeli
On Fri, Feb 01, 2019 at 06:57:38PM -0500, Andrea Arcangeli wrote: > If it's cleared with ptep_clear_flush_notify, change_pte still won't > work. The above text needs updating with > "ptep_clear_flush". set_pte_at_notify is all about having > ptep_clear_flush only befo

Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory

2019-02-01 Thread Andrea Arcangeli
pen is that the CPU could write to the page through a TLB fill without page fault while the secondary MMUs still read the old memory in the old readonly page. Thanks, Andrea

Re: [PATCH v2] sched: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()

2019-01-31 Thread Andrea Parri
On Mon, Jan 21, 2019 at 04:52:40PM +0100, Andrea Parri wrote: > move_queued_task() synchronizes with task_rq_lock() as follows: > > move_queued_task() task_rq_lock() > > [S] ->on_rq = MIGRATING [L] rq = task_rq() > WMB (__set_task_cp

Re: [PATCH] afs: Add missing memory barriers in afs_manage_cell()

2019-01-31 Thread Andrea Parri
On Thu, Jan 17, 2019 at 04:31:32PM +0100, Andrea Parri wrote: > On Mon, Nov 26, 2018 at 05:44:12PM +0100, Andrea Parri wrote: > > As the comments for wake_up_bit() and waitqueue_active() point out, > > the barriers are needed to order the clearing of the _FL_NO

Re: [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook

2019-01-31 Thread Andrea Arcangeli
gt; invalidate_range() already. > > CC: Benjamin Herrenschmidt > CC: Paul Mackerras > CC: Michael Ellerman > CC: Alistair Popple > CC: Alexey Kardashevskiy > CC: Mark Hairgrove > CC: Balbir Singh > CC: David Gibson > CC: Andrea Arcangeli > CC: Jerome Glisse &g

[PATCH] MAINTAINERS: Update cgroup entry

2019-01-31 Thread Andrea Parri
Fix wildcard patterns and add cgroup-v2 documentation. Signed-off-by: Andrea Parri --- MAINTAINERS | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 9f64f8d3740ed..a96054c1d870a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3906,9 +3906,10

Re: [LSF/MM TOPIC]: userfaultfd (was: [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE)

2019-01-30 Thread Andrea Arcangeli
d-gigabytes/terabytes regions, async uffd-wp should perform much better. Thanks, Andrea

Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants

2019-01-30 Thread Andrea Parri
ior versions of these functions. > > Co-developed-by: Peter Zijlstra (Intel) > Signed-off-by: Elena Reshetova Reviewed-by: Andrea Parri Andrea > --- > Documentation/core-api/refcount-vs-atomic.rst | 24 +--- > arch/x86/include/asm

Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants

2019-01-29 Thread Andrea Parri
and, > and this was my conclusion that it should provide this, but I can easily be > wrong > on this. > > Andrea, Peter, could you please comment? Short version: I am not convinced by the above sentence, and I suggest to remove it (as done in http://lkml.kernel.org/r/201901281429

[LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE

2019-01-29 Thread Andrea Arcangeli
MAP was rightfully naked early on and quickly replaced by UFFDIO_COPY which is more optimal to add memory to a mapping is small chunks, but we can't remove memory with UFFDIO_COPY and UFFDIO_REMAP should be as efficient as it gets when it comes to removing memory from a mapping. Thank you, Andrea

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-29 Thread Andrea Righi
On Mon, Jan 28, 2019 at 02:26:20PM -0500, Vivek Goyal wrote: > On Mon, Jan 28, 2019 at 06:41:29PM +0100, Andrea Righi wrote: > > Hi Vivek, > > > > sorry for the late reply. > > > > On Mon, Jan 21, 2019 at 04:47:15PM -0500, Vivek Goyal wrote: > > > On Sat

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-28 Thread Andrea Righi
Hi Vivek, sorry for the late reply. On Mon, Jan 21, 2019 at 04:47:15PM -0500, Vivek Goyal wrote: > On Sat, Jan 19, 2019 at 11:08:27AM +0100, Andrea Righi wrote: > > [..] > > Alright, let's skip the root cgroup for now. I think the point here is > > if we want to provide s

Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants

2019-01-28 Thread Andrea Parri
after_ctrl_dep(); > + return true; > +} > + > +return false; There appears to be some white-space damage (here and in other places); checkpatch.pl should point these and other style problems out. Andrea > } >

Re: Plain accesses and data races in the Linux Kernel Memory Model

2019-01-22 Thread Andrea Parri
o suggest s/marked/Marked, s/plain/Plain and similarly for the other sets to be introduced. Andrea C sync-rcu-is-not-idempotent { } P0(int *x, int *y) { int r0; WRITE_ONCE(*x, 1); synchronize_rcu(); synchronize_rcu(); r0 = READ_ONCE(*y); }

[PATCH v2] sched: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()

2019-01-21 Thread Andrea Parri
pu()) to honor this address dependency. Also, mark the accesses to ->cpu and ->on_rq with READ_ONCE()/WRITE_ONCE() to comply with the LKMM. Signed-off-by: Andrea Parri Cc: Ingo Molnar Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Alan Stern Cc: Will Deacon --- Changes in v

Re: [PATCH] sched: Use READ_ONCE()/WRITE_ONCE() in task_cpu()/__set_task_cpu()

2019-01-21 Thread Andrea Parri
On Mon, Jan 21, 2019 at 01:25:26PM +0100, Peter Zijlstra wrote: > On Mon, Jan 21, 2019 at 11:51:21AM +0100, Andrea Parri wrote: > > On Wed, Jan 16, 2019 at 07:42:18PM +0100, Andrea Parri wrote: > > > The smp_wmb() in move_queued_task() (c.f., __set_task_cpu()) pairs with >

Re: [PATCH] kcov: convert kcov.refcount to refcount_t

2019-01-21 Thread Andrea Parri
On Mon, Jan 21, 2019 at 01:29:11PM +0100, Dmitry Vyukov wrote: > On Mon, Jan 21, 2019 at 12:45 PM Andrea Parri > wrote: > > > > On Mon, Jan 21, 2019 at 10:52:37AM +0100, Dmitry Vyukov wrote: > > > > [...] > > > > > > Am I missing som

Re: [PATCH] kcov: convert kcov.refcount to refcount_t

2019-01-21 Thread Andrea Parri
rt > > Suggested-by: Kees Cook > Reviewed-by: David Windsor > Reviewed-by: Hans Liljestrand > Signed-off-by: Elena Reshetova Reviewed-by: Andrea Parri (Same remark about the reference in the commit message. ;-) ) Andrea > --- > kernel/kcov.c | 9 + > 1 file cha

Re: [PATCH] kcov: convert kcov.refcount to refcount_t

2019-01-21 Thread Andrea Parri
dependency? Loads > can hoist across control dependency, no? As you remarked, the doc. says CTRL+RELEASE (so yes, loads can hoist); AFAICR, implementations do comply to this requirement. (FWIW, I sometimes think at this "weird" ordering as a weak "acq_rel", the

[tip:locking/core] tools/memory-model: Model smp_mb__after_unlock_lock()

2019-01-21 Thread tip-bot for Andrea Parri
Commit-ID: 5b735eb1ce481b2f1674a47c0995944b1cb6f5d5 Gitweb: https://git.kernel.org/tip/5b735eb1ce481b2f1674a47c0995944b1cb6f5d5 Author: Andrea Parri AuthorDate: Mon, 3 Dec 2018 15:04:49 -0800 Committer: Ingo Molnar CommitDate: Mon, 21 Jan 2019 11:06:55 +0100 tools/memory-model: Model

Re: [PATCH] sched: Use READ_ONCE()/WRITE_ONCE() in task_cpu()/__set_task_cpu()

2019-01-21 Thread Andrea Parri
On Wed, Jan 16, 2019 at 07:42:18PM +0100, Andrea Parri wrote: > The smp_wmb() in move_queued_task() (c.f., __set_task_cpu()) pairs with > the composition of the dependency and the ACQUIRE in task_rq_lock(): > > move_queued_task() task_rq_lock() > >

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-19 Thread Andrea Righi
On Fri, Jan 18, 2019 at 02:46:53PM -0500, Josef Bacik wrote: > On Fri, Jan 18, 2019 at 07:44:03PM +0100, Andrea Righi wrote: > > On Fri, Jan 18, 2019 at 11:35:31AM -0500, Josef Bacik wrote: > > > On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote: > > > >

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-18 Thread Andrea Righi
On Fri, Jan 18, 2019 at 06:07:45PM +0100, Paolo Valente wrote: > > > > Il giorno 18 gen 2019, alle ore 17:35, Josef Bacik > > ha scritto: > > > > On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote: > >> This is a redesign of my old cgroup-io-th

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-18 Thread Andrea Righi
On Fri, Jan 18, 2019 at 11:35:31AM -0500, Josef Bacik wrote: > On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote: > > This is a redesign of my old cgroup-io-throttle controller: > > https://lwn.net/Articles/330531/ > > > > I'm resuming this old patch to point

Re: Plain accesses and data races in the Linux Kernel Memory Model

2019-01-18 Thread Andrea Parri
On Fri, Jan 18, 2019 at 10:10:22AM -0500, Alan Stern wrote: > On Thu, 17 Jan 2019, Andrea Parri wrote: > > > > Can the compiler (maybe, it does?) transform, at the C or at the "asm" > > > level, LB1's P0 in LB2's P0 (LB1 and LB2 are reported below)? > > &

Re: [PATCH 0/5] sched refcount_t conversions

2019-01-18 Thread Andrea Parri
struct.stack_refcount to refcount_t For the series, please feel free to add: Reviewed-by: Andrea Parri (You may still want to update the references to the 'refcount-vs-atomic' doc. in the commit messages.) Andrea > > fs/exec.c| 4 ++-- > fs/proc/task_nommu.c

Re: [PATCH 1/5] sched: convert sighand_struct.count to refcount_t

2019-01-18 Thread Andrea Parri
it is hopefully soon > in state to be merged to the documentation tree. Just a remark to point out that that document got merged, even though in a different location/format: c.f., b6e859f6cdd1 ("docs: refcount_t documentation") Andrea

Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-18 Thread Andrea Righi
On Fri, Jan 18, 2019 at 12:04:17PM +0100, Paolo Valente wrote: > > > > Il giorno 18 gen 2019, alle ore 11:31, Andrea Righi > > ha scritto: > > > > This is a redesign of my old cgroup-io-throttle controller: > > https://lwn.net/Articles/330531/ > > &g

[RFC PATCH 1/3] fsio-throttle: documentation

2019-01-18 Thread Andrea Righi
Document the filesystem I/O controller: description, usage, design, etc. Signed-off-by: Andrea Righi --- Documentation/cgroup-v1/fsio-throttle.txt | 142 ++ 1 file changed, 142 insertions(+) create mode 100644 Documentation/cgroup-v1/fsio-throttle.txt diff --git

[RFC PATCH 0/3] cgroup: fsio throttle controller

2019-01-18 Thread Andrea Righi
A: Correct, the tradeoff here is to tolerate I/O bursts during writeback to avoid priority inversion problems in the system. Andrea Righi (3): fsio-throttle: documentation fsio-throttle: controller infrastructure fsio-throttle: instrumentation Documentation/cgroup-v1/fsio-throt

[RFC PATCH 3/3] fsio-throttle: instrumentation

2019-01-18 Thread Andrea Righi
Apply the fsio controller to the opportune kernel functions to evaluate and throttle filesystem I/O. Signed-off-by: Andrea Righi --- block/blk-core.c | 10 ++ include/linux/writeback.h | 7 ++- mm/filemap.c | 20 +++- mm/page-writeback.c

[RFC PATCH 2/3] fsio-throttle: controller infrastructure

2019-01-18 Thread Andrea Righi
This is the core of the fsio-throttle controller: it defines the interface to the cgroup subsystem and implements the I/O measurement and throttling logic. Signed-off-by: Andrea Righi --- include/linux/cgroup_subsys.h | 4 + include/linux/fsio-throttle.h | 43 +++ init/Kconfig

Re: [PATCH] afs: Add missing memory barriers in afs_manage_cell()

2019-01-17 Thread Andrea Parri
On Mon, Nov 26, 2018 at 05:44:12PM +0100, Andrea Parri wrote: > As the comments for wake_up_bit() and waitqueue_active() point out, > the barriers are needed to order the clearing of the _FL_NOT_READY > bit and the waitqueue_active() load; match the implicit barrier in > pre

Re: Plain accesses and data races in the Linux Kernel Memory Model

2019-01-17 Thread Andrea Parri
On Wed, Jan 16, 2019 at 10:36:58PM +0100, Andrea Parri wrote: > [...] > > > The difficulty with incorporating plain accesses in the memory model > > is that the compiler has very few constraints on how it treats plain > > accesses. It can eliminate them, duplicate them, r

Re: Plain accesses and data races in the Linux Kernel Memory Model

2019-01-16 Thread Andrea Parri
**x, int *y, int *b) { int r0; r0 = READ_ONCE(*y); rcu_assign_pointer(*x, b); } exists (0:r0=b /\ 1:r0=1) LB1 and LB2 are data-race free, according to the patch; LB1's "exists" clause is not satisfiable, while LB2's "exists" clause is satisfiable.

<    2   3   4   5   6   7   8   9   10   11   >