RE: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().

2024-05-04 Thread David Laight
From: Waiman Long > Sent: 03 May 2024 23:14 > > > On 5/3/24 17:10, David Laight wrote: > > From: Waiman Long > >> Sent: 03 May 2024 17:00 > > ... > >> David, > >> > >> Could you respin the series based on the latest upstream code?

RE: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().

2024-05-03 Thread David Laight
From: Waiman Long > Sent: 03 May 2024 17:00 ... > David, > > Could you respin the series based on the latest upstream code? I've just reapplied the patches to 'master' and they all apply cleanly and diffing the new patches to the old ones gives no differences. So I think they should still apply.

RE: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().

2024-05-03 Thread David Laight
From: Waiman Long > Sent: 03 May 2024 17:00 > To: David Laight ; 'linux-kernel@vger.kernel.org' > ker...@vger.kernel.org>; 'pet...@infradead.org' > Cc: 'mi...@redhat.com' ; 'w...@kernel.org' > ; 'boqun.f...@gmail.com' > ; 'Linus Torvalds' ; > 'virtualization@lists

RE: [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's 'node' in the osq_lock() fast path.

2024-01-02 Thread David Laight
From: Boqun Feng > Sent: 02 January 2024 18:54 > > On Sat, Dec 30, 2023 at 03:49:52PM +0000, David Laight wrote: > [...] > > But it looks odd that osq_unlock()'s fast path uses _release but the very > > similar code in osq_wait_next() uses _acquire. > > > > Th

RE: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().

2024-01-02 Thread David Laight
From: Ingo Molnar > Sent: 02 January 2024 09:54 > > > * David Laight wrote: > > > per_cpu_ptr() indexes __per_cpu_offset[] with the cpu number. > > This requires the cpu number be 64bit. > > However the value is osq_lock() comes from a 32bit xchg() and the

RE: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().

2024-01-01 Thread David Laight
From: Waiman Long > Sent: 01 January 2024 04:14 ... > You really like micro-optimization. They all add up :-) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)

[PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().

2023-12-31 Thread David Laight
knows the decrement has set the high bits to zero and doesn't add a register-register move (or cltq) to zero/sign extend the value. Not massive but saves two instructions. Signed-off-by: David Laight --- kernel/locking/osq_lock.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff

[PATCH next v2 4/5] locking/osq_lock: Avoid writing to node->next in the osq_lock() fast path.

2023-12-31 Thread David Laight
at can leave it non-NULL check with WARN_ON_ONCE() and NULL if set. Note that without this check the fast path (adding at the list head) doesn't need to to access the per-cpu osq_node at all. Signed-off-by: David Laight --- kernel/locking/osq_lock.c | 14 ++ 1 file changed, 10 insertions

[PATCH next v2 3/5] locking/osq_lock: Use node->prev_cpu instead of saving node->prev.

2023-12-31 Thread David Laight
_next() call in the unqueue path. Normally this is exactly the value that the initial xchg() read from lock->tail (used to obtain 'prev'), but can get updated by concurrent unqueues. Both the 'prev' and 'cpu' members of optimistic_spin_node are now unused and can be deleted. Signed-off-b

[PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check.

2023-12-31 Thread David Laight
ead save 'prev->cpu' in 'node->prev_cpu' and use that value instead. Update in the osq_lock() 'unqueue' path when 'node->prev' is changed. This is simpler than checking for 'node->prev' changing and caching 'prev->cpu'. Signed-off-by: David Laight --- kernel/locki

[PATCH next v2 1/5] locking/osq_lock: Defer clearing node->locked until the slow osq_lock() path.

2023-12-31 Thread David Laight
Since node->locked cannot be set before the assignment to prev->next it is save to clear it in the slow path. Signed-off-by: David Laight --- kernel/locking/osq_lock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c

[PATCH next v2 0/5] locking/osq_lock: Optimisations to osq_lock code.

2023-12-31 Thread David Laight
et gcc to convert __per_cpu_offset[cpu - 1] to (__per_cpu_offset - 1)[cpu] (cpu is offset by one) but, in any case, it would still need zero extending in the common case. David Laight (5): 1) Defer clearing node->locked until the slow osq_lock() path. 2) Optimise vcpu_is_preempted() check.

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-31 Thread David Laight
From: Linus Torvalds > Sent: 30 December 2023 20:59 > > On Sat, 30 Dec 2023 at 12:41, Linus Torvalds > wrote: > > > > UNTESTED patch to just do the "this_cpu_write()" parts attached. > > Again, note how we do end up doing that this_cpu_ptr conversion later > > anyway, but at least it's off the

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-31 Thread David Laight
From: Linus Torvalds > Sent: 30 December 2023 20:41 > > On Fri, 29 Dec 2023 at 12:57, David Laight wrote: > > > > this_cpu_ptr() is rather more expensive than raw_cpu_read() since > > the latter can use an 'offset from register' (%gs for x86-84). > >

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-31 Thread David Laight
From: Waiman Long > Sent: 31 December 2023 03:04 > The presence of debug_smp_processor_id in your compiled code is likely > due to the setting of CONFIG_DEBUG_PREEMPT in your kernel config. > > #ifdef CONFIG_DEBUG_PREEMPT >   extern unsigned int debug_smp_processor_id(void); > # define

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-30 Thread David Laight
From: Ingo Molnar > Sent: 30 December 2023 20:38 > > * David Laight wrote: > > > bool osq_lock(struct optimistic_spin_queue *lock) > > { > > - struct optimistic_spin_node *node = this_cpu_ptr(_node); > > + struct optimistic_spin_node *node = raw_cpu

RE: [PATCH next 0/5] locking/osq_lock: Optimisations to osq_lock code

2023-12-30 Thread David Laight
From: Linus Torvalds > Sent: 30 December 2023 19:41 > > On Fri, 29 Dec 2023 at 12:52, David Laight wrote: > > > > David Laight (5): > > Move the definition of optimistic_spin_node into osf_lock.c > > Clarify osq_wait_next() > > I took these

RE: [PATCH next 5/5] locking/osq_lock: Optimise vcpu_is_preempted() check.

2023-12-30 Thread David Laight
From: Waiman Long > Sent: 30 December 2023 15:57 > > On 12/29/23 22:13, Waiman Long wrote: > > > > On 12/29/23 15:58, David Laight wrote: > >> The vcpu_is_preempted() test stops osq_lock() spinning if a virtual > >>    cpu is no longer running. > >&

RE: [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's 'node' in the osq_lock() fast path.

2023-12-30 Thread David Laight
From: Waiman Long > Sent: 30 December 2023 03:20 > > On 12/29/23 17:11, David Laight wrote: > > osq_lock() starts by setting node->next to NULL and node->locked to 0. > > Careful analysis shows that node->next is always NULL on entry. > > > > node->l

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-30 Thread David Laight
From: Ingo Molnar > Sent: 30 December 2023 11:09 > > > * Waiman Long wrote: > > > On 12/29/23 15:57, David Laight wrote: > > > this_cpu_ptr() is rather more expensive than raw_cpu_read() since > > > the latter can use an 'offset from register' (%gs for

[PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's 'node' in the osq_lock() fast path.

2023-12-29 Thread David Laight
et to zero just before that (along with the assignment to node->prev). Only initialise node->cpu once, after that use its value instead of smp_processor_id() - which is probably a real function call. Should reduce cache-line bouncing a little. Signed-off-by: David Laight --- Re

RE: [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's 'node' in the osq_lock() fast path.

2023-12-29 Thread David Laight
et to zero just before that (along with the assignment to node->prev). Only initialise node->cpu once, after that use its value instead of smp_processor_id() - which is probably a real function call. Should reduce cache-line bouncing a little. Signed-off-by: David Laight --- kernel/lo

[PATCH next 1/5] locking/osq_lock: Move the definition of optimistic_spin_node into osf_lock.c

2023-12-29 Thread David Laight
struct optimistic_spin_node is private to the implementation. Move it into the C file to ensure nothing is accessing it. Signed-off-by: David Laight --- include/linux/osq_lock.h | 5 - kernel/locking/osq_lock.c | 7 +++ 2 files changed, 7 insertions(+), 5 deletions(-) diff --git

[PATCH next 0/5] locking/osq_lock: Optimisations to osq_lock code

2023-12-29 Thread David Laight
he fields are initialised on the first osq_lock() call. The last patch avoids the cache line reload calling vcpu_is_preempted() by simply saving node->prev->cpu as node->prev_cpu and updating it when node->prev changes. This is simpler than the patch proposed by Waimon. David Laight (5):

[PATCH next 5/5] locking/osq_lock: Optimise vcpu_is_preempted() check.

2023-12-29 Thread David Laight
ead save 'prev->cpu' in 'node->prev_cpu' and use that value instead. Update in the osq_lock() 'unqueue' path when 'node->prev' is changed. This is simpler than checking for 'node->prev' changing and caching 'prev->cpu'. Signed-off-by: David Laight --- kernel/locki

[PATCH next 3/5] locking/osq_lock: Clarify osq_wait_next()

2023-12-29 Thread David Laight
since gcc manages to assume that 'prev != NULL' due to an earlier dereference. Signed-off-by: David Laight --- kernel/locking/osq_lock.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 55f5db

[PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-29 Thread David Laight
this_cpu_ptr() is rather more expensive than raw_cpu_read() since the latter can use an 'offset from register' (%gs for x86-84). Add a 'self' field to 'struct optimistic_spin_node' that can be read with raw_cpu_read(), initialise on first call. Signed-off-by: David Laight --- kernel/locking

RE: [PATCH 0/4] Section alignment issues?

2023-12-22 Thread David Laight
... > diff --git a/include/linux/init.h b/include/linux/init.h > index 3fa3f6241350..650311e4b215 100644 > --- a/include/linux/init.h > +++ b/include/linux/init.h > @@ -264,6 +264,7 @@ extern struct module __this_module; > #define define_initcall(fn, __stub, __name, __sec) \ >

RE: [PATCH v5 02/15] ring-buffer: Page size per ring buffer

2023-12-21 Thread David Laight
> I think 1kb units is perfectly fine (patch 15 changes to kb units). The > interface says its to define the minimal size of the sub-buffer, not the > actual size. I didn't read that far through :-( David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT,

RE: [PATCH v5 02/15] ring-buffer: Page size per ring buffer

2023-12-21 Thread David Laight
From: Steven Rostedt > Sent: 20 December 2023 13:01 > > On Wed, 20 Dec 2023 08:48:02 +0000 > David Laight wrote: > > > From: Steven Rostedt > > > Sent: 19 December 2023 18:54 > > > From: "Tzvetomir Stoyanov (VMware)" > > > > &g

RE: [PATCH v5 02/15] ring-buffer: Page size per ring buffer

2023-12-20 Thread David Laight
From: Steven Rostedt > Sent: 19 December 2023 18:54 > From: "Tzvetomir Stoyanov (VMware)" > > Currently the size of one sub buffer page is global for all buffers and > it is hard coded to one system page. In order to introduce configurable > ring buffer sub page size, the internal logic should

RE: [PATCH] ring-buffer: Remove 32bit timestamp logic

2023-12-17 Thread David Laight
... > My guess is that *most* 32-bit architectures do not have a 64-bit > cmpxchg - not even the irq-safe one. Does any sparc32 even have a 32-bit cmpxchg? The original versions (which were definitely SMP capable) only had a byte sized atomic exchange that always wrote 0xff. Sparc32 does have

RE: [PATCH] tracing/user_events: align uaddr on unsigned long alignment

2023-09-17 Thread David Laight
From: Clément Léger > Sent: 14 September 2023 14:11 > > enabler->uaddr can be aligned on 32 or 64 bits. If aligned on 32 bits, > this will result in a misaligned access on 64 bits architectures since > set_bit()/clear_bit() are expecting an unsigned long (aligned) pointer. > On architecture that

RE: [PATCH v2] uapi: fix __DECLARE_FLEX_ARRAY for C++

2023-09-11 Thread David Laight
... > Okay, can you please split the patch so they can be backported > separately? Then I'll get them landed, etc. Since the header with just the extra #endif is badly broken on C++ isn't it best to ensure they get back-ported together? So one patch is probably better. David -

RE: [PATCH] media: atomisp: silence "dubious: !x | !y" warning

2021-04-20 Thread David Laight
From: Dan Carpenter > Sent: 20 April 2021 11:28 > > On Sat, Apr 17, 2021 at 09:31:32PM +0000, David Laight wrote: > > From: Mauro Carvalho Chehab > > > Sent: 17 April 2021 19:56 > > > > > > Em Sat, 17 Apr 2021 21:06:27 +0530 > > > Ashish

RE: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-20 Thread David Laight
From: Geert Uytterhoeven > Sent: 20 April 2021 08:40 > > Hi Willy, > > On Sat, Apr 17, 2021 at 4:49 AM Matthew Wilcox wrote: > > Replacement patch to fix compiler warning. > > > > 32-bit architectures which expect 8-byte alignment for 8-byte integers > > and need 64-bit DMA addresses (arc, arm,

RE: [PATCH 0/3] x86 disk image and modules initramfs generation

2021-04-20 Thread David Laight
From: H. Peter Anvin > Sent: 20 April 2021 00:03 > > When compiling on a different machine than the runtime target, > including but not limited to simulators, it is rather handy to be able > to produce a bootable image. The scripts for that in x86 are > relatively old, and assume a BIOS system.

RE: [PATCH 05/15] x86: Implement function_nocfi

2021-04-19 Thread David Laight
From: Rasmus Villemoes > Sent: 19 April 2021 09:40 > > On 17/04/2021 00.28, Kees Cook wrote: > > On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote: > > >> The > >> foo symbol would point to whatever magic is needed. > > > > No, the symbol points to the jump table entry. Direct

RE: [PATCH 05/15] x86: Implement function_nocfi

2021-04-19 Thread David Laight
From: Andy Lutomirski > Sent: 18 April 2021 01:12 .. > Slightly more complicated: > > struct opaque_symbol; > extern struct opaque_symbol entry_SYSCALL_64; > > The opaque_symbol variant avoids any possible confusion over the weird > status of arrays in C, and it's hard to misuse, since struct >

RE: [PATCH v6 03/10] KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing

2021-04-19 Thread David Laight
From: wangyanan (Y) > Sent: 19 April 2021 07:40 > > Hi Paolo, > > On 2021/4/17 21:23, Paolo Bonzini wrote: > > On 30/03/21 10:08, Yanan Wang wrote: > >> In addition to function of CLOCK_MONOTONIC, flag CLOCK_MONOTONIC_RAW can > >> also shield possiable impact of NTP, which can provide more

RE: [PATCH] media: atomisp: silence "dubious: !x | !y" warning

2021-04-17 Thread David Laight
From: Mauro Carvalho Chehab > Sent: 17 April 2021 19:56 > > Em Sat, 17 Apr 2021 21:06:27 +0530 > Ashish Kalra escreveu: > > > Upon running sparse, "warning: dubious: !x | !y" is brought to notice > > for this file. Logical and bitwise OR are basically the same in this > > context so it doesn't

RE: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread David Laight
From: Matthew Wilcox > Sent: 17 April 2021 03:45 > > Replacement patch to fix compiler warning. ... > static inline dma_addr_t page_pool_get_dma_addr(struct page *page) > { > - return page->dma_addr; > + dma_addr_t ret = page->dma_addr[0]; > + if (sizeof(dma_addr_t) >

RE: [PATCH 2/2] mm: Indicate pfmemalloc pages in compound_head

2021-04-17 Thread David Laight
From: Matthew Wilcox (Oracle) > Sent: 17 April 2021 00:07 > > The net page_pool wants to use a magic value to identify page pool pages. > The best place to put it is in the first word where it can be clearly a > non-pointer value. That means shifting dma_addr up to alias with ->index, > which

RE: [PATCH 05/15] x86: Implement function_nocfi

2021-04-17 Thread David Laight
From: Kees Cook > Sent: 16 April 2021 23:28 > > On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote: > > On Fri, Apr 16, 2021 at 3:03 PM Borislav Petkov wrote: > > > > > > On Fri, Apr 16, 2021 at 02:49:23PM -0700, Sami Tolvanen wrote: > > > > __nocfi only disables CFI checking in a

RE: [PATCH] x86/uaccess: small optimization in unsafe_copy_to_user()

2021-04-17 Thread David Laight
From: Al Viro On Behalf Of Al Viro > Sent: 16 April 2021 20:44 > On Fri, Apr 16, 2021 at 12:24:13PM -0700, Eric Dumazet wrote: > > From: Eric Dumazet > > > > We have to loop only to copy u64 values. > > After this first loop, we copy at most one u32, one u16 and one byte. > > Does it actually

RE: [PATCH 00/13] [RFC] Rust support

2021-04-17 Thread David Laight
From: Peter Zijlstra > Sent: 17 April 2021 12:17 ... > > (i'd argue this is C being broken; promoting only as far as int, when > > assigning to an unsigned long is Bad, but until/unless either GCC fixes > > that or the language committee realises that being stuck in the 1970s > > is Bad, people

RE: [PATCH 1/3] arm64: ptrace: Add is_syscall_success to handle compat

2021-04-17 Thread David Laight
From: Mark Rutland > Sent: 16 April 2021 14:35 .. > @@ -51,13 +48,7 @@ static inline void syscall_set_return_value(struct > task_struct *task, > struct pt_regs *regs, > int error, long val) > { > - if (error)

RE: Bogus struct page layout on 32-bit

2021-04-17 Thread David Laight
From: Grygorii Strashko > Sent: 16 April 2021 10:27 ... > Sry, for delayed reply. > > The TI platforms am3/4/5 (cpsw) and Keystone 2 (netcp) can do only 32bit DMA > even in case of LPAE > (dma-ranges are used). > Originally, as I remember, CONFIG_ARCH_DMA_ADDR_T_64BIT has not been selected >

RE: [PATCH 00/13] [RFC] Rust support

2021-04-17 Thread David Laight
.. > The more you make it look like (Kernel) C, the easier it is for us C > people to actually read. My eyes have been reading C for almost 30 years > by now, they have a lexer built in the optical nerve; reading something > that looks vaguely like C but is definitely not C is an utterly painful >

RE: [PATCH 00/13] [RFC] Rust support

2021-04-17 Thread David Laight
From: Peter Zijlstra > Sent: 16 April 2021 15:19 > > On Fri, Apr 16, 2021 at 02:07:49PM +0100, Wedson Almeida Filho wrote: > > On Fri, Apr 16, 2021 at 01:24:23PM +0200, Peter Zijlstra wrote: > > > > int perf_event_task_enable(void) > > > { > > > + DEFINE_MUTEX_GUARD(event_mutex,

RE: [PATCH 1/5] scsi: BusLogic: Fix missing `pr_cont' use

2021-04-17 Thread David Laight
From: Maciej W. Rozycki > Sent: 16 April 2021 11:49 > > On Thu, 15 Apr 2021, Joe Perches wrote: > > > In patch 2, vscnprintf should probably be used to make sure it's > > 0 terminated. > > Why? C99 has this[1]: > > "The vsnprintf function is equivalent to snprintf, with the variable >

RE: [PATCH net-next v2 0/3] introduce skb_for_each_frag()

2021-04-17 Thread David Laight
From: Matteo Croce > Sent: 16 April 2021 23:44 ... > > A more interesting change would be something that generated: > > unsigned int nr_frags = skb_shinfo(skb)->nr_frags; > > for (i = 0; i < nr_frags; i++) { > > since that will run faster for most loops. > > But that is ~impossible

RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-17 Thread David Laight
From: Matthew Wilcox > Sent: 16 April 2021 16:28 > > On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote: > > See below patch. Where I swap32 the dma address to satisfy > > page->compound having bit zero cleared. (It is the simplest fix I could > > come up with). > > I think

RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-16 Thread David Laight
From: Matthew Wilcox > Sent: 15 April 2021 23:22 > > On Thu, Apr 15, 2021 at 09:11:56PM +0000, David Laight wrote: > > Isn't it possible to move the field down one long? > > This might require an explicit zero - but this is not a common > > code path - th

RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-15 Thread David Laight
From: Matthew Wilcox > Sent: 15 April 2021 19:22 > > On Thu, Apr 15, 2021 at 08:08:32PM +0200, Jesper Dangaard Brouer wrote: > > +static inline > > +dma_addr_t page_pool_dma_addr_read(dma_addr_t dma_addr) > > +{ > > + /* Workaround for storing 64-bit DMA-addr on 32-bit machines in struct > > +

RE: [PATCH v2 1/2] sparc: explicitly set PCI_IOBASE to 0

2021-04-15 Thread David Laight
From: Niklas Schnelle > Sent: 15 April 2021 13:37 > > Instead of relying on the fallback in asm-generic/io.h which sets > PCI_IOBASE 0 if it is not defined set it explicitly. > > Link: > https://lore.kernel.org/lkml/CAK8P3a3PK9zyeP4ymELtc2ZYnymECoACiigw9Za+pvSJpCk5=g...@mail.gmail.com/ >

RE: [PATCH 00/13] [RFC] Rust support

2021-04-15 Thread David Laight
... > Besides just FP, 128-bit, etc, I remain concerned about just basic > math operations. C has no way to describe the intent of integer > overflow, so the kernel was left with the only "predictable" result: > wrap around. Unfortunately, this is wrong in most cases, and we're left > with entire

RE: [PATCH 00/13] [RFC] Rust support

2021-04-14 Thread David Laight
From: Linus Torvalds > Sent: 14 April 2021 21:22 > > On Wed, Apr 14, 2021 at 1:10 PM Matthew Wilcox wrote: > > > > There's a philosophical point to be discussed here which you're skating > > right over! Should rust-in-the-linux-kernel provide the same memory > > allocation APIs as the

RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-14 Thread David Laight
From: Matthew Wilcox > Sent: 14 April 2021 22:36 > > On Wed, Apr 14, 2021 at 09:13:22PM +0200, Jesper Dangaard Brouer wrote: > > (If others want to reproduce). First I could not reproduce on ARM32. > > Then I found out that enabling CONFIG_XEN on ARCH=arm was needed to > > cause the issue by

RE: [PATCH 2/2] ptrace: is_syscall_success: Add syscall return code handling for compat task

2021-04-14 Thread David Laight
From: Oleg Nesterov > Sent: 14 April 2021 17:56 > > On 04/14, David Laight wrote: > > > > From: Oleg Nesterov > > > Sent: 14 April 2021 16:08 > > > > > > Add audit maintainers... > > > > > > On 04/14, He Zhe wrote: > > &

RE: [PATCH 2/2] ptrace: is_syscall_success: Add syscall return code handling for compat task

2021-04-14 Thread David Laight
From: Oleg Nesterov > Sent: 14 April 2021 16:08 > > Add audit maintainers... > > On 04/14, He Zhe wrote: > > > > When 32-bit userspace application is running on 64-bit kernel, the 32-bit > > syscall return code would be changed from u32 to u64 in regs_return_value > > and then changed to s64.

RE: [PATCH v2 3/3] rseq: optimise rseq_get_rseq_cs() and clear_rseq_cs()

2021-04-14 Thread David Laight
From: Eric Dumazet > Sent: 14 April 2021 17:00 ... > > Repeated unsafe_get_user() calls are crying out for an optimisation. > > You get something like: > > failed = 0; > > copy(); > > if (failed) goto error; > > copy(); > > if (failed) goto error; > > Where

RE: [RFC][PATCH] locking: Generic ticket-lock

2021-04-14 Thread David Laight
From: Peter Zijlstra > Sent: 14 April 2021 13:56 > > > I've tested it on csky SMP*4 hw (860) & riscv SMP*4 hw (c910) and it's okay. > > W00t :-) > > > Hope you can keep > > typedef struct { > > union { > > atomic_t lock; > > struct __raw_tickets { > >

RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-14 Thread David Laight
> Doing this fixes it: > > +++ b/include/linux/types.h > @@ -140,7 +140,7 @@ typedef u64 blkcnt_t; > * so they don't care about the size of the actual bus addresses. > */ > #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT > -typedef u64 dma_addr_t; > +typedef u64 __attribute__((aligned(sizeof(void *

RE: [PATCH 1/2] audit: Add syscall return code handling for compat task

2021-04-14 Thread David Laight
From: He Zhe > Sent: 14 April 2021 09:02 > > When 32-bit userspace application is running on 64-bit kernel, the 32-bit > syscall return code would be changed from u32 to u64 in regs_return_value > and then changed to s64. Hence the negative return code recorded by audit > would end up being a big

RE: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible

2021-04-14 Thread David Laight
From: Segher Boessenkool > Sent: 14 April 2021 16:19 ... > > Could the kernel use GCC builtin atomic functions instead ? > > > > https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html > > Certainly that should work fine for the simpler cases that the atomic > operations are meant to

RE: [PATCH rdma-next 00/10] Enable relaxed ordering for ULPs

2021-04-14 Thread David Laight
From: Tom Talpey > Sent: 14 April 2021 15:16 > > On 4/12/2021 6:48 PM, Jason Gunthorpe wrote: > > On Mon, Apr 12, 2021 at 04:20:47PM -0400, Tom Talpey wrote: > > > >> So the issue is only in testing all the providers and platforms, > >> to be sure this new behavior isn't tickling anything that

RE: [PATCH] asm-generic/io.h: Silence -Wnull-pointer-arithmetic warning on PCI_IOBASE

2021-04-14 Thread David Laight
From: Niklas Schnelle > Sent: 14 April 2021 13:35 > > On Tue, 2021-04-13 at 14:12 +0000, David Laight wrote: > > From: Arnd Bergmann > > > Sent: 13 April 2021 14:40 > > > > > > On Tue, Apr 13, 2021 at 3:06 PM David Laight > > > wrote: > >

RE: [PATCH v2 3/3] rseq: optimise rseq_get_rseq_cs() and clear_rseq_cs()

2021-04-14 Thread David Laight
From: Arjun Roy > Sent: 13 April 2021 23:04 > > On Tue, Apr 13, 2021 at 2:19 PM David Laight wrote: > > > > > If we're special-casing 64-bit architectures anyways - unrolling the > > > 32B copy_from_user() for struct rseq_cs appears to be roughly 5-10% > &

RE: [PATCH v2 3/3] rseq: optimise rseq_get_rseq_cs() and clear_rseq_cs()

2021-04-13 Thread David Laight
> If we're special-casing 64-bit architectures anyways - unrolling the > 32B copy_from_user() for struct rseq_cs appears to be roughly 5-10% > savings on x86-64 when I measured it (well, in a microbenchmark, not > in rseq_get_rseq_cs() directly). Perhaps that could be an additional > avenue for

RE: [PATCH] MIPS: Fix strnlen_user access check

2021-04-13 Thread David Laight
From: Thomas Bogendoerfer > Sent: 13 April 2021 16:19 > > On Tue, Apr 13, 2021 at 12:37:25PM +0000, David Laight wrote: > > From: Thomas Bogendoerfer > > > Sent: 13 April 2021 12:15 > > ... > > > > The __access_ok() is noted with `Ensure that the

RE: [PATCH 3/3] rseq: optimise for 64bit arches

2021-04-13 Thread David Laight
From: Mathieu Desnoyers > Sent: 13 April 2021 15:22 ... > > David > > > >> So I suppose that if we're going to #ifdef this, we might as well do the > >> whole thing. > >> > >> Mathieu; did I forget a reason why this cannot work? > > The only difference it brings on 32-bit is that the

RE: [PATCH] asm-generic/io.h: Silence -Wnull-pointer-arithmetic warning on PCI_IOBASE

2021-04-13 Thread David Laight
From: Arnd Bergmann > Sent: 13 April 2021 14:40 > > On Tue, Apr 13, 2021 at 3:06 PM David Laight wrote: > > > > From: Arnd Bergmann > > > Sent: 13 April 2021 13:58 > > ... > > > The remaining ones (csky, m68k, sparc32) need to be inspected > >

RE: [PATCH] asm-generic/io.h: Silence -Wnull-pointer-arithmetic warning on PCI_IOBASE

2021-04-13 Thread David Laight
From: Arnd Bergmann > Sent: 13 April 2021 13:58 ... > The remaining ones (csky, m68k, sparc32) need to be inspected > manually to see if they currently support PCI I/O space but in > fact use address zero as the base (with large resources) or they > should also turn the operations into a NOP. I'd

RE: [PATCH] asm-generic/io.h: Silence -Wnull-pointer-arithmetic warning on PCI_IOBASE

2021-04-13 Thread David Laight
From: Arnd Bergmann > Sent: 13 April 2021 13:27 > > On Tue, Apr 13, 2021 at 1:54 PM Niklas Schnelle > wrote: > > > > When PCI_IOBASE is not defined, it is set to 0 such that it is ignored > > in calls to the readX/writeX primitives. While mathematically obvious > > this triggers clang's

RE: [PATCH] MIPS: Fix strnlen_user access check

2021-04-13 Thread David Laight
From: Thomas Bogendoerfer > Sent: 13 April 2021 12:15 ... > > The __access_ok() is noted with `Ensure that the range [addr, addr+size) > > is within the process's address space`. Does the range checked by > > __access_ok() on MIPS is [addr, addr+size]. So if we want to use > > access_ok(s, 1),

RE: [PATCH] riscv: locks: introduce ticket-based spinlock implementation

2021-04-13 Thread David Laight
From: Catalin Marinas > Sent: 13 April 2021 11:45 ... > This indeed needs some care. IIUC RISC-V has similar restrictions as arm > here, no load/store instructions are allowed between LR and SC. You > can't guarantee that the compiler won't spill some variable onto the > stack. You can probably

RE: [PATCH 3/3] rseq: optimise for 64bit arches

2021-04-13 Thread David Laight
From: Peter Zijlstra > Sent: 13 April 2021 10:10 > > On Tue, Apr 13, 2021 at 12:36:57AM -0700, Eric Dumazet wrote: > > From: Eric Dumazet > > > > Commit ec9c82e03a74 ("rseq: uapi: Declare rseq_cs field as union, > > update includes") added regressions for our servers. > > > > Using

RE: [PATCH 2/6] staging: media: intel-ipu3: preferred __aligned(size) over __attribute__aligned(size)

2021-04-13 Thread David Laight
From: sakari.ai...@linux.intel.com > Sent: 13 April 2021 10:56 > > Hi David, > > On Tue, Apr 13, 2021 at 07:40:12AM +0000, David Laight wrote: > > From: Mitali Borkar > > > Sent: 12 April 2021 00:09 > > > > > > This patch fixes the

RE: [PATCH v6 1/1] use crc32 instead of md5 for hibernation e820 integrity check

2021-04-13 Thread David Laight
From: Chris von Recklinghausen > Sent: 12 April 2021 20:51 ... > > This is not about BIOS bugs. Hibernation is deep suspend/resume > > grafted onto cold boot, and it is perfectly legal for the firmware to > > present a different memory map to the OS after a cold boot. It is > > Linux that decides

RE: [PATCH] MIPS: Fix strnlen_user access check

2021-04-13 Thread David Laight
From: Jinyang He > Sent: 13 April 2021 02:16 > > > On Mon, Apr 12, 2021 at 11:02:19AM +0800, Tiezhu Yang wrote: > >> On 04/11/2021 07:04 PM, Jinyang He wrote: > >>> Commit 04324f44cb69 ("MIPS: Remove get_fs/set_fs") brought a problem for > >>> strnlen_user(). Jump out when checking access_ok()

RE: [PATCH 1/1] mm: Fix struct page layout on 32-bit systems

2021-04-13 Thread David Laight
From: Matthew Wilcox > Sent: 12 April 2021 19:24 > > On Sun, Apr 11, 2021 at 11:43:07AM +0200, Jesper Dangaard Brouer wrote: > > Could you explain your intent here? > > I worry about @index. > > > > As I mentioned in other thread[1] netstack use page_is_pfmemalloc() > > (code copy-pasted below

RE: [PATCH net-next v2 0/3] introduce skb_for_each_frag()

2021-04-13 Thread David Laight
From: Matteo Croce > Sent: 12 April 2021 01:38 > > Introduce skb_for_each_frag, an helper macro to iterate over the SKB frags. The real question is why, the change is: - for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + skb_for_each_frag(skb, i) { The existing code isn't

RE: [PATCH 2/6] staging: media: intel-ipu3: preferred __aligned(size) over __attribute__aligned(size)

2021-04-13 Thread David Laight
From: Mitali Borkar > Sent: 12 April 2021 00:09 > > This patch fixes the warning identified by checkpatch.pl by replacing > __attribute__aligned(size) with __aligned(size) > > Signed-off-by: Mitali Borkar > --- > .../staging/media/ipu3/include/intel-ipu3.h | 74 +-- > 1 file

RE: [PATCH 5/5] compat: consolidate the compat_flock{,64} definition

2021-04-12 Thread David Laight
From: Arnd Bergmann > Sent: 12 April 2021 12:26 > > On Mon, Apr 12, 2021 at 12:54 PM David Laight wrote: > > From: David Laight > Sent: 12 April 2021 10:37 > > ... > > > I'm guessing that compat_pid_t is 16 bits? > > > So the native 32bit v

RE: [PATCH 5/5] compat: consolidate the compat_flock{,64} definition

2021-04-12 Thread David Laight
From: David Laight > Sent: 12 April 2021 10:37 ... > I'm guessing that compat_pid_t is 16 bits? > So the native 32bit version has an unnamed 2 byte structure pad. > The 'packed' removes this pad from the compat structure. > > AFAICT (apart from mips) the __ARCH_COMPAT_FLOCK_PA

RE: consolidate the flock uapi definitions

2021-04-12 Thread David Laight
From: Arnd Bergmann > Sent: 12 April 2021 11:04 > > On Mon, Apr 12, 2021 at 10:55 AM Christoph Hellwig wrote: > > > > Hi all, > > > > currently we deal with the slight differents in the various architecture > > variants of the flock and flock64 stuctures in a very cruft way. This > > series

RE: [PATCH 5/5] compat: consolidate the compat_flock{,64} definition

2021-04-12 Thread David Laight
From: Christoph Hellwig > Sent: 12 April 2021 09:56 > > Provide a single common definition for the compat_flock and > compat_flock64 structures using the same tricks as for the native > variants. An extra define is added for the packing required on x86. > ... > /* > - * IA32 uses 4 byte

RE: Candidate Linux ABI for Intel AMX and hypothetical new related features

2021-04-12 Thread David Laight
From: Len Brown > Sent: 11 April 2021 20:07 ... > Granted, if you have an application that is statically linked and run > on new hardware > and new OS, it can still fail. That also includes anything compiled and released as a program binary that must run on older Linux installations. Such

RE: Bogus struct page layout on 32-bit

2021-04-10 Thread David Laight
From: Matthew Wilcox > Sent: 10 April 2021 03:43 > On Sat, Apr 10, 2021 at 06:45:35AM +0800, kernel test robot wrote: > > >> include/linux/mm_types.h:274:1: error: static_assert failed due to > > >> requirement > '__builtin_offsetof(struct page, lru) == __builtin_offsetof(struct folio, > lru)'

RE: [PATCH rdma-next 00/10] Enable relaxed ordering for ULPs

2021-04-10 Thread David Laight
From: Tom Talpey > Sent: 09 April 2021 18:49 > On 4/9/2021 12:27 PM, Haakon Bugge wrote: > > > > > >> On 9 Apr 2021, at 17:32, Tom Talpey wrote: > >> > >> On 4/9/2021 10:45 AM, Chuck Lever III wrote: > On Apr 9, 2021, at 10:26 AM, Tom Talpey wrote: > > On 4/6/2021 7:49 AM, Jason

RE: static_branch/jump_label vs branch merging

2021-04-10 Thread David Laight
From: David Malcolm > Sent: 09 April 2021 14:49 ... > With the caveat that my knowledge of GCC's middle-end is mostly about > implementing warnings, rather than optimization, I did some > experimentation, with gcc trunk on x86_64 FWIW. > > Given: > > int __attribute__((pure)) foo(void); > > int

RE: [PATCH net v1] Revert "lan743x: trim all 4 bytes of the FCS; not just 2"

2021-04-09 Thread David Laight
From: Sven Van Asbroeck > Sent: 08 April 2021 19:35 ... > - buffer_length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING; > + buffer_length = netdev->mtu + ETH_HLEN + ETH_FCS_LEN + > RX_HEAD_PADDING; I'd try to write the lengths in the order they happen, so: buffer_length =

RE: [PATCH v4 1/1] use crc32 instead of md5 for hibernation e820 integrity check

2021-04-09 Thread David Laight
From: Chris von Recklinghausen > Sent: 08 April 2021 11:46 > > Suspend fails on a system in fips mode because md5 is used for the e820 > integrity check and is not available. Use crc32 instead. > > Prior to this patch, MD5 is used only to create a digest to ensure integrity > of > the region,

RE: [tip: x86/core] x86/retpoline: Simplify retpolines

2021-04-06 Thread David Laight
From: tip-b...@linutronix.de > Sent: 03 April 2021 12:11 ... > Notice that since the longest alternative sequence is now: > >0: e8 07 00 00 00 callq c <.altinstr_replacement+0xc> >5: f3 90 pause >7: 0f ae e8lfence >a: eb f9

RE: [PATCH v2 04/15] ACPI: table: replace __attribute__((packed)) by __packed

2021-04-01 Thread David Laight
From: Rafael J. Wysocki > Sent: 01 April 2021 14:50 ... > So what exactly is wrong with using "packed"? It is way easier to > understand for a casual reader of the code. Because it is usually wrong! If I have: struct foo { u64 val; } __packed; And then have: u64

RE: [PATCH v8 3/6] stack: Optionally randomize kernel stack offset each syscall

2021-04-01 Thread David Laight
From: Will Deacon > Sent: 01 April 2021 09:31 ... > > +/* > > + * These macros must be used during syscall entry when interrupts and > > + * preempt are disabled, and after user registers have been stored to > > + * the stack. > > + */ > > +#define add_random_kstack_offset() do {

RE: [PATCH v2 04/15] ACPI: table: replace __attribute__((packed)) by __packed

2021-04-01 Thread David Laight
From: Bjorn Helgaas > Sent: 31 March 2021 18:22 > > On Wed, Mar 31, 2021 at 11:55:08PM +0800, Zhang Rui wrote: > > ... > > > From e18c942855e2f51e814d057fff4dd951cd0d0907 Mon Sep 17 00:00:00 2001 > > From: Zhang Rui > > Date: Wed, 31 Mar 2021 20:34:13 +0800 > > Subject: [PATCH] ACPI: tables:

RE: [PATCH v2 04/15] ACPI: table: replace __attribute__((packed)) by __packed

2021-03-31 Thread David Laight
From: Zhang Rui > Sent: 31 March 2021 16:55 > On Tue, 2021-03-30 at 08:14 +, David Laight wrote: > > From: Zhang Rui > > > Sent: 30 March 2021 09:00 > > > > On Tue, 2021-03-30 at 10:23 +0800, Xiaofei Tan wrote: > > > > > Hi David, > >

RE: [PATCH] blk-mq: fix alignment mismatch.

2021-03-31 Thread David Laight
From: Nathan Chancellor > Sent: 31 March 2021 00:30 > > Hi Jian, > > On Tue, Mar 30, 2021 at 04:02:49PM -0700, Jian Cai wrote: > > This fixes the mismatch of alignments between csd and its use as an > > argument to smp_call_function_single_async, which causes build failure > > when

  1   2   3   4   5   6   7   8   9   10   >