Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On 11/05, Mina Almasry wrote: > On Tue, Nov 5, 2024 at 1:46 PM Stanislav Fomichev > wrote: > > > > > Also, the information is useless to the user. If the user sees 'frag > > > > > 128 failed to free'. There is nothing really the user can do to > > > > > recover at runtime. Only usefulness that could come is for the user to > > > > > log the error. We already WARN_ON_ONCE on the error the user would not > > > > > be able to trigger. > > > > > > > > I'm thinking from the pow of user application. It might have bugs as > > > > well and try to refill something that should not have been refilled. > > > > Having info about which particular token has failed (even just for > > > > the logging purposes) might have been nice. > > > > > > Yeah, it may have been nice. On the flip side it complicates calling > > > sock_devmem_dontneed(). The userspace need to count the freed frags in > > > its input, remove them, skip the leaked one, and re-call the syscall. > > > On the flipside the userspace gets to know the id of the frag that > > > leaked but the usefulness of the information is slightly questionable > > > for me. :shrug: > > > > Right, because I was gonna suggest for this patch, instead of having > > a separate extra loop that returns -E2BIG (since this loop is basically > > mostly cycles wasted assuming most of the calls are gonna be well behaved), > > can we keep num_frags freed as we go and stop and return once > > we reach MAX_DONTNEED_FRAGS? > > > > for (i = 0; i < num_tokens; i++) { > > for (j ...) { > > netmem_ref netmem ... > > ... > > } > > num_frags += tokens[i].token_count; > > if (num_frags > MAX_DONTNEED_FRAGS) > > return ret; > > } > > > > Or do you still find it confusing because userspace has to handle that? > > Ah, I don't think this will work, because it creates this scenario: > > - user calls SO_DEVMEM_DONTNEED passing 1030 tokens. > - Kernel returns 500 freed. > - User doesn't know whether: > (a) The remaining 530 tokens are all attached to the last > tokens.count and that's why the kernel returned early, or > (b) the kernel leaked 530 tokens because it could not find any of them > in the sk_user_frags. > > In (a) the user is supposed to recall SO_DEVMEM_DONTNEED on the > remaining 530 tokens, but in (b) the user is not supposed to do that > (the tokens have leaked and there is nothing the user can do to > recover). I kinda feel like people will still write code against internal limits anyway? At least that's what we did with the internal version of your code: you know that you can't return more than 128 tokens per call so you don't even try. If you get an error, or ret != the expected length, you kill the connection. It seems like there is no graceful recovery from that? Regarding your (a) vs (b) example, you can try to call DONTNEED another time for both cases and either get non-zero and make some progress or get 0 and give up? > The current interface is more simple. The kernel either returns an > error (nothing has been freed): recall SO_DEVMEM_DONTNEED on all the > tokens after resolving the error, or, > > the kernel returns a positive value which means all the tokens have > been freed (or unrecoverably leaked), and the userspace must not call > SO_DEVMEM_DONTNEED on this batch again. Totally agree that it's more simple. But my worry is that we now essentially waste a bunch of cpu looping over and testing for the condition that's not gonna happed in a well-behaved applications. But maybe I'm over blowing it, idk. (I'm gonna wait for you to respin before formally sending acks because it's not clear which series goes where...)
Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On Tue, Nov 5, 2024 at 1:46 PM Stanislav Fomichev wrote: > > > > Also, the information is useless to the user. If the user sees 'frag > > > > 128 failed to free'. There is nothing really the user can do to > > > > recover at runtime. Only usefulness that could come is for the user to > > > > log the error. We already WARN_ON_ONCE on the error the user would not > > > > be able to trigger. > > > > > > I'm thinking from the pow of user application. It might have bugs as > > > well and try to refill something that should not have been refilled. > > > Having info about which particular token has failed (even just for > > > the logging purposes) might have been nice. > > > > Yeah, it may have been nice. On the flip side it complicates calling > > sock_devmem_dontneed(). The userspace need to count the freed frags in > > its input, remove them, skip the leaked one, and re-call the syscall. > > On the flipside the userspace gets to know the id of the frag that > > leaked but the usefulness of the information is slightly questionable > > for me. :shrug: > > Right, because I was gonna suggest for this patch, instead of having > a separate extra loop that returns -E2BIG (since this loop is basically > mostly cycles wasted assuming most of the calls are gonna be well behaved), > can we keep num_frags freed as we go and stop and return once > we reach MAX_DONTNEED_FRAGS? > > for (i = 0; i < num_tokens; i++) { > for (j ...) { > netmem_ref netmem ... > ... > } > num_frags += tokens[i].token_count; > if (num_frags > MAX_DONTNEED_FRAGS) > return ret; > } > > Or do you still find it confusing because userspace has to handle that? Ah, I don't think this will work, because it creates this scenario: - user calls SO_DEVMEM_DONTNEED passing 1030 tokens. - Kernel returns 500 freed. - User doesn't know whether: (a) The remaining 530 tokens are all attached to the last tokens.count and that's why the kernel returned early, or (b) the kernel leaked 530 tokens because it could not find any of them in the sk_user_frags. In (a) the user is supposed to recall SO_DEVMEM_DONTNEED on the remaining 530 tokens, but in (b) the user is not supposed to do that (the tokens have leaked and there is nothing the user can do to recover). The current interface is more simple. The kernel either returns an error (nothing has been freed): recall SO_DEVMEM_DONTNEED on all the tokens after resolving the error, or, the kernel returns a positive value which means all the tokens have been freed (or unrecoverably leaked), and the userspace must not call SO_DEVMEM_DONTNEED on this batch again. -- Thanks, Mina
Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On Wed, Oct 30, 2024 at 8:07 AM Stanislav Fomichev wrote: > > On 10/30, Mina Almasry wrote: > > On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev > > wrote: > > > > > > On 10/29, Mina Almasry wrote: > > > > Check we're going to free a reasonable number of frags in token_count > > > > before starting the loop, to prevent looping too long. > > > > > > > > Also minor code cleanups: > > > > - Flip checks to reduce indentation. > > > > - Use sizeof(*tokens) everywhere for consistentcy. > > > > > > > > Cc: Yi Lai > > > > > > > > Signed-off-by: Mina Almasry > > > > > > > > --- > > > > net/core/sock.c | 46 -- > > > > 1 file changed, 28 insertions(+), 18 deletions(-) > > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > index 7f398bd07fb7..8603b8d87f2e 100644 > > > > --- a/net/core/sock.c > > > > +++ b/net/core/sock.c > > > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, > > > > int bytes) > > > > > > > > #ifdef CONFIG_PAGE_POOL > > > > > > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in > > > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in > > > > * 1 syscall. The limit exists to limit the amount of memory the kernel > > > > - * allocates to copy these tokens. > > > > + * allocates to copy these tokens, and to prevent looping over the > > > > frags for > > > > + * too long. > > > > */ > > > > -#define MAX_DONTNEED_TOKENS 128 > > > > +#define MAX_DONTNEED_FRAGS 1024 > > > > > > > > static noinline_for_stack int > > > > sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int > > > > optlen) > > > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t > > > > optval, unsigned int optlen) > > > > unsigned int num_tokens, i, j, k, netmem_num = 0; > > > > struct dmabuf_token *tokens; > > > > netmem_ref netmems[16]; > > > > + u64 num_frags = 0; > > > > int ret = 0; > > > > > > > > if (!sk_is_tcp(sk)) > > > > return -EBADF; > > > > > > > > - if (optlen % sizeof(struct dmabuf_token) || > > > > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) > > > > + if (optlen % sizeof(*tokens) || > > > > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS) > > > > return -EINVAL; > > > > > > > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL); > > > > + num_tokens = optlen / sizeof(*tokens); > > > > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); > > > > if (!tokens) > > > > return -ENOMEM; > > > > > > > > - num_tokens = optlen / sizeof(struct dmabuf_token); > > > > if (copy_from_sockptr(tokens, optval, optlen)) { > > > > kvfree(tokens); > > > > return -EFAULT; > > > > } > > > > > > > > + for (i = 0; i < num_tokens; i++) { > > > > + num_frags += tokens[i].token_count; > > > > + if (num_frags > MAX_DONTNEED_FRAGS) { > > > > + kvfree(tokens); > > > > + return -E2BIG; > > > > + } > > > > + } > > > > + > > > > xa_lock_bh(&sk->sk_user_frags); > > > > for (i = 0; i < num_tokens; i++) { > > > > for (j = 0; j < tokens[i].token_count; j++) { > > > > netmem_ref netmem = (__force > > > > netmem_ref)__xa_erase( > > > > &sk->sk_user_frags, tokens[i].token_start > > > > + j); > > > > > > > > - if (netmem && > > > > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { > > > > - netmems[netmem_num++] = netmem; > > > > - if (netmem_num == ARRAY_SIZE(netmems)) { > > > > - xa_unlock_bh(&sk->sk_user_frags); > > > > - for (k = 0; k < netmem_num; k++) > > > > - > > > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > > > > - netmem_num = 0; > > > > - xa_lock_bh(&sk->sk_user_frags); > > > > - } > > > > - ret++; > > > > > > [..] > > > > > > > + if (!netmem || > > > > WARN_ON_ONCE(!netmem_is_net_iov(netmem))) > > > > + continue; > > > > > > Any reason we are not returning explicit error to the callers here? > > > That probably needs some mechanism to signal which particular one failed > > > so the users can restart? > > > > Only because I can't think of a simple way to return an array of frags > > failed to DONTNEED to the user. > > I'd expect the call to return as soon as it hits the invalid frag > entry (plus the number of entries that it successfully refilled up to > the invalid one). But too late I guess. > > > Also, this error should be
Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On 11/05, Mina Almasry wrote: > On Wed, Oct 30, 2024 at 8:07 AM Stanislav Fomichev > wrote: > > > > On 10/30, Mina Almasry wrote: > > > On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev > > > wrote: > > > > > > > > On 10/29, Mina Almasry wrote: > > > > > Check we're going to free a reasonable number of frags in token_count > > > > > before starting the loop, to prevent looping too long. > > > > > > > > > > Also minor code cleanups: > > > > > - Flip checks to reduce indentation. > > > > > - Use sizeof(*tokens) everywhere for consistentcy. > > > > > > > > > > Cc: Yi Lai > > > > > > > > > > Signed-off-by: Mina Almasry > > > > > > > > > > --- > > > > > net/core/sock.c | 46 -- > > > > > 1 file changed, 28 insertions(+), 18 deletions(-) > > > > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > > index 7f398bd07fb7..8603b8d87f2e 100644 > > > > > --- a/net/core/sock.c > > > > > +++ b/net/core/sock.c > > > > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock > > > > > *sk, int bytes) > > > > > > > > > > #ifdef CONFIG_PAGE_POOL > > > > > > > > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED > > > > > in > > > > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED > > > > > in > > > > > * 1 syscall. The limit exists to limit the amount of memory the > > > > > kernel > > > > > - * allocates to copy these tokens. > > > > > + * allocates to copy these tokens, and to prevent looping over the > > > > > frags for > > > > > + * too long. > > > > > */ > > > > > -#define MAX_DONTNEED_TOKENS 128 > > > > > +#define MAX_DONTNEED_FRAGS 1024 > > > > > > > > > > static noinline_for_stack int > > > > > sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int > > > > > optlen) > > > > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, > > > > > sockptr_t optval, unsigned int optlen) > > > > > unsigned int num_tokens, i, j, k, netmem_num = 0; > > > > > struct dmabuf_token *tokens; > > > > > netmem_ref netmems[16]; > > > > > + u64 num_frags = 0; > > > > > int ret = 0; > > > > > > > > > > if (!sk_is_tcp(sk)) > > > > > return -EBADF; > > > > > > > > > > - if (optlen % sizeof(struct dmabuf_token) || > > > > > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) > > > > > + if (optlen % sizeof(*tokens) || > > > > > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS) > > > > > return -EINVAL; > > > > > > > > > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL); > > > > > + num_tokens = optlen / sizeof(*tokens); > > > > > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), > > > > > GFP_KERNEL); > > > > > if (!tokens) > > > > > return -ENOMEM; > > > > > > > > > > - num_tokens = optlen / sizeof(struct dmabuf_token); > > > > > if (copy_from_sockptr(tokens, optval, optlen)) { > > > > > kvfree(tokens); > > > > > return -EFAULT; > > > > > } > > > > > > > > > > + for (i = 0; i < num_tokens; i++) { > > > > > + num_frags += tokens[i].token_count; > > > > > + if (num_frags > MAX_DONTNEED_FRAGS) { > > > > > + kvfree(tokens); > > > > > + return -E2BIG; > > > > > + } > > > > > + } > > > > > + > > > > > xa_lock_bh(&sk->sk_user_frags); > > > > > for (i = 0; i < num_tokens; i++) { > > > > > for (j = 0; j < tokens[i].token_count; j++) { > > > > > netmem_ref netmem = (__force > > > > > netmem_ref)__xa_erase( > > > > > &sk->sk_user_frags, > > > > > tokens[i].token_start + j); > > > > > > > > > > - if (netmem && > > > > > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { > > > > > - netmems[netmem_num++] = netmem; > > > > > - if (netmem_num == ARRAY_SIZE(netmems)) { > > > > > - > > > > > xa_unlock_bh(&sk->sk_user_frags); > > > > > - for (k = 0; k < netmem_num; k++) > > > > > - > > > > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > > > > > - netmem_num = 0; > > > > > - xa_lock_bh(&sk->sk_user_frags); > > > > > - } > > > > > - ret++; > > > > > > > > [..] > > > > > > > > > + if (!netmem || > > > > > WARN_ON_ONCE(!netmem_is_net_iov(netmem))) > > > > > + continue; > > > > > > > > Any reason we are not returning explicit error to the callers here? > > > > That probably needs some mechanism to signal which particular one failed > > > > so the users can restart? > > > > > > Only bec
Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev wrote: > > On 10/29, Mina Almasry wrote: > > Check we're going to free a reasonable number of frags in token_count > > before starting the loop, to prevent looping too long. > > > > Also minor code cleanups: > > - Flip checks to reduce indentation. > > - Use sizeof(*tokens) everywhere for consistentcy. > > > > Cc: Yi Lai > > > > Signed-off-by: Mina Almasry > > > > --- > > net/core/sock.c | 46 -- > > 1 file changed, 28 insertions(+), 18 deletions(-) > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 7f398bd07fb7..8603b8d87f2e 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int > > bytes) > > > > #ifdef CONFIG_PAGE_POOL > > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in > > * 1 syscall. The limit exists to limit the amount of memory the kernel > > - * allocates to copy these tokens. > > + * allocates to copy these tokens, and to prevent looping over the frags > > for > > + * too long. > > */ > > -#define MAX_DONTNEED_TOKENS 128 > > +#define MAX_DONTNEED_FRAGS 1024 > > > > static noinline_for_stack int > > sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int > > optlen) > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t > > optval, unsigned int optlen) > > unsigned int num_tokens, i, j, k, netmem_num = 0; > > struct dmabuf_token *tokens; > > netmem_ref netmems[16]; > > + u64 num_frags = 0; > > int ret = 0; > > > > if (!sk_is_tcp(sk)) > > return -EBADF; > > > > - if (optlen % sizeof(struct dmabuf_token) || > > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) > > + if (optlen % sizeof(*tokens) || > > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS) > > return -EINVAL; > > > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL); > > + num_tokens = optlen / sizeof(*tokens); > > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); > > if (!tokens) > > return -ENOMEM; > > > > - num_tokens = optlen / sizeof(struct dmabuf_token); > > if (copy_from_sockptr(tokens, optval, optlen)) { > > kvfree(tokens); > > return -EFAULT; > > } > > > > + for (i = 0; i < num_tokens; i++) { > > + num_frags += tokens[i].token_count; > > + if (num_frags > MAX_DONTNEED_FRAGS) { > > + kvfree(tokens); > > + return -E2BIG; > > + } > > + } > > + > > xa_lock_bh(&sk->sk_user_frags); > > for (i = 0; i < num_tokens; i++) { > > for (j = 0; j < tokens[i].token_count; j++) { > > netmem_ref netmem = (__force netmem_ref)__xa_erase( > > &sk->sk_user_frags, tokens[i].token_start + > > j); > > > > - if (netmem && > > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { > > - netmems[netmem_num++] = netmem; > > - if (netmem_num == ARRAY_SIZE(netmems)) { > > - xa_unlock_bh(&sk->sk_user_frags); > > - for (k = 0; k < netmem_num; k++) > > - > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > > - netmem_num = 0; > > - xa_lock_bh(&sk->sk_user_frags); > > - } > > - ret++; > > [..] > > > + if (!netmem || > > WARN_ON_ONCE(!netmem_is_net_iov(netmem))) > > + continue; > > Any reason we are not returning explicit error to the callers here? > That probably needs some mechanism to signal which particular one failed > so the users can restart? Only because I can't think of a simple way to return an array of frags failed to DONTNEED to the user. Also, this error should be extremely rare or never hit really. I don't know how we end up not finding a netmem here or the netmem is page. The only way is if the user is malicious (messing with the token ids passed to the kernel) or if a kernel bug is happening. Also, the information is useless to the user. If the user sees 'frag 128 failed to free'. There is nothing really the user can do to recover at runtime. Only usefulness that could come is for the user to log the error. We already WARN_ON_ONCE on the error the user would not be able to trigger. -- Thanks, Mina
Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On 10/29, Mina Almasry wrote: > Check we're going to free a reasonable number of frags in token_count > before starting the loop, to prevent looping too long. > > Also minor code cleanups: > - Flip checks to reduce indentation. > - Use sizeof(*tokens) everywhere for consistentcy. > > Cc: Yi Lai > > Signed-off-by: Mina Almasry > > --- > net/core/sock.c | 46 -- > 1 file changed, 28 insertions(+), 18 deletions(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index 7f398bd07fb7..8603b8d87f2e 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, int > bytes) > > #ifdef CONFIG_PAGE_POOL > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in > * 1 syscall. The limit exists to limit the amount of memory the kernel > - * allocates to copy these tokens. > + * allocates to copy these tokens, and to prevent looping over the frags for > + * too long. > */ > -#define MAX_DONTNEED_TOKENS 128 > +#define MAX_DONTNEED_FRAGS 1024 > > static noinline_for_stack int > sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen) > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t > optval, unsigned int optlen) > unsigned int num_tokens, i, j, k, netmem_num = 0; > struct dmabuf_token *tokens; > netmem_ref netmems[16]; > + u64 num_frags = 0; > int ret = 0; > > if (!sk_is_tcp(sk)) > return -EBADF; > > - if (optlen % sizeof(struct dmabuf_token) || > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) > + if (optlen % sizeof(*tokens) || > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS) > return -EINVAL; > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL); > + num_tokens = optlen / sizeof(*tokens); > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); > if (!tokens) > return -ENOMEM; > > - num_tokens = optlen / sizeof(struct dmabuf_token); > if (copy_from_sockptr(tokens, optval, optlen)) { > kvfree(tokens); > return -EFAULT; > } > > + for (i = 0; i < num_tokens; i++) { > + num_frags += tokens[i].token_count; > + if (num_frags > MAX_DONTNEED_FRAGS) { > + kvfree(tokens); > + return -E2BIG; > + } > + } > + > xa_lock_bh(&sk->sk_user_frags); > for (i = 0; i < num_tokens; i++) { > for (j = 0; j < tokens[i].token_count; j++) { > netmem_ref netmem = (__force netmem_ref)__xa_erase( > &sk->sk_user_frags, tokens[i].token_start + j); > > - if (netmem && > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { > - netmems[netmem_num++] = netmem; > - if (netmem_num == ARRAY_SIZE(netmems)) { > - xa_unlock_bh(&sk->sk_user_frags); > - for (k = 0; k < netmem_num; k++) > - > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > - netmem_num = 0; > - xa_lock_bh(&sk->sk_user_frags); > - } > - ret++; [..] > + if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem))) > + continue; Any reason we are not returning explicit error to the callers here? That probably needs some mechanism to signal which particular one failed so the users can restart?
Re: [PATCH net-next v1 6/7] net: fix SO_DEVMEM_DONTNEED looping too long
On 10/30, Mina Almasry wrote: > On Wed, Oct 30, 2024 at 7:33 AM Stanislav Fomichev > wrote: > > > > On 10/29, Mina Almasry wrote: > > > Check we're going to free a reasonable number of frags in token_count > > > before starting the loop, to prevent looping too long. > > > > > > Also minor code cleanups: > > > - Flip checks to reduce indentation. > > > - Use sizeof(*tokens) everywhere for consistentcy. > > > > > > Cc: Yi Lai > > > > > > Signed-off-by: Mina Almasry > > > > > > --- > > > net/core/sock.c | 46 -- > > > 1 file changed, 28 insertions(+), 18 deletions(-) > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > index 7f398bd07fb7..8603b8d87f2e 100644 > > > --- a/net/core/sock.c > > > +++ b/net/core/sock.c > > > @@ -1047,11 +1047,12 @@ static int sock_reserve_memory(struct sock *sk, > > > int bytes) > > > > > > #ifdef CONFIG_PAGE_POOL > > > > > > -/* This is the number of tokens that the user can SO_DEVMEM_DONTNEED in > > > +/* This is the number of frags that the user can SO_DEVMEM_DONTNEED in > > > * 1 syscall. The limit exists to limit the amount of memory the kernel > > > - * allocates to copy these tokens. > > > + * allocates to copy these tokens, and to prevent looping over the frags > > > for > > > + * too long. > > > */ > > > -#define MAX_DONTNEED_TOKENS 128 > > > +#define MAX_DONTNEED_FRAGS 1024 > > > > > > static noinline_for_stack int > > > sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int > > > optlen) > > > @@ -1059,43 +1060,52 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t > > > optval, unsigned int optlen) > > > unsigned int num_tokens, i, j, k, netmem_num = 0; > > > struct dmabuf_token *tokens; > > > netmem_ref netmems[16]; > > > + u64 num_frags = 0; > > > int ret = 0; > > > > > > if (!sk_is_tcp(sk)) > > > return -EBADF; > > > > > > - if (optlen % sizeof(struct dmabuf_token) || > > > - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) > > > + if (optlen % sizeof(*tokens) || > > > + optlen > sizeof(*tokens) * MAX_DONTNEED_FRAGS) > > > return -EINVAL; > > > > > > - tokens = kvmalloc_array(optlen, sizeof(*tokens), GFP_KERNEL); > > > + num_tokens = optlen / sizeof(*tokens); > > > + tokens = kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); > > > if (!tokens) > > > return -ENOMEM; > > > > > > - num_tokens = optlen / sizeof(struct dmabuf_token); > > > if (copy_from_sockptr(tokens, optval, optlen)) { > > > kvfree(tokens); > > > return -EFAULT; > > > } > > > > > > + for (i = 0; i < num_tokens; i++) { > > > + num_frags += tokens[i].token_count; > > > + if (num_frags > MAX_DONTNEED_FRAGS) { > > > + kvfree(tokens); > > > + return -E2BIG; > > > + } > > > + } > > > + > > > xa_lock_bh(&sk->sk_user_frags); > > > for (i = 0; i < num_tokens; i++) { > > > for (j = 0; j < tokens[i].token_count; j++) { > > > netmem_ref netmem = (__force netmem_ref)__xa_erase( > > > &sk->sk_user_frags, tokens[i].token_start + > > > j); > > > > > > - if (netmem && > > > - !WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { > > > - netmems[netmem_num++] = netmem; > > > - if (netmem_num == ARRAY_SIZE(netmems)) { > > > - xa_unlock_bh(&sk->sk_user_frags); > > > - for (k = 0; k < netmem_num; k++) > > > - > > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > > > - netmem_num = 0; > > > - xa_lock_bh(&sk->sk_user_frags); > > > - } > > > - ret++; > > > > [..] > > > > > + if (!netmem || > > > WARN_ON_ONCE(!netmem_is_net_iov(netmem))) > > > + continue; > > > > Any reason we are not returning explicit error to the callers here? > > That probably needs some mechanism to signal which particular one failed > > so the users can restart? > > Only because I can't think of a simple way to return an array of frags > failed to DONTNEED to the user. I'd expect the call to return as soon as it hits the invalid frag entry (plus the number of entries that it successfully refilled up to the invalid one). But too late I guess. > Also, this error should be extremely rare or never hit really. I don't > know how we end up not finding a netmem here or the netmem is page. > The only way is if the user is malicious (messing with the token ids > passed to the kernel) or if a kernel bug is happening. I do hit this error with 1500 mtu, so it would've been nice