Re: [PATCH bpf] samples/bpf: kbuild: add CONFIG_SAMPLE_BPF Kconfig

2019-10-02 Thread Björn Töpel
On Thu, 3 Oct 2019 at 01:14, Ivan Khoronzhuk wrote: > > On Wed, Oct 02, 2019 at 09:41:15AM +0200, Björn Töpel wrote: > >On Wed, 2 Oct 2019 at 03:49, Masahiro Yamada > > wrote: > >> > >[...] > >> > Yes, the BPF samples require clang/LLVM with BPF support to build. Any > >> > suggestion on a good wa

Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF

2019-10-02 Thread Masami Hiramatsu
On Mon, 30 Sep 2019 11:31:29 -0700 Kees Cook wrote: > On Sat, Sep 28, 2019 at 07:37:27PM -0400, Steven Rostedt wrote: > > On Wed, 28 Aug 2019 21:07:24 -0700 > > Alexei Starovoitov wrote: > > > > > > > > This won’t make me much more comfortable, since CAP_BPF lets it do an > > > > ever-growing

Re: [PATCH] tcp: add tsval and tsecr to TCP_INFO

2019-10-02 Thread William Dauchy
On Thu, Oct 3, 2019 at 1:14 AM Eric Dumazet wrote: > I would rather use a new getsockopt() to fetch this specific data, > instead of making TCP_INFO bigger for everyone :/ > > ss command can dump millions of sockets in busy hosts, we need to be > careful of TCP_INFO size. Thanks Eric for your adv

[PATCH net-next] mlxsw: PCI: Send EMAD traffic on a separate queue

2019-10-02 Thread Ido Schimmel
From: Petr Machata Currently mlxsw distributes sent traffic among all the available send queues. That includes control traffic as well as EMADs, which are used for configuration of the device. However because all the queues have the same traffic class of 3, they all end up being directed to the

Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes

2019-10-02 Thread Jiri Pirko
Thu, Oct 03, 2019 at 04:34:22AM CEST, dsah...@gmail.com wrote: >On 10/2/19 12:21 PM, Jiri Pirko wrote: This patch adds an "in hardware" indication to IPv4 routes, so that users will have better visibility into the offload process. In the future IPv6 will be extended with this indicat

Re: [patch net-next v2 11/15] netdevsim: implement proper devlink reload

2019-10-02 Thread Jiri Pirko
Thu, Oct 03, 2019 at 02:38:12AM CEST, jakub.kicin...@netronome.com wrote: >On Wed, 2 Oct 2019 18:12:27 +0200, Jiri Pirko wrote: >> From: Jiri Pirko >> >> During devlink reload, all driver objects should be reinstantiated with >> the exception of devlink instance and devlink resources and params.

Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes

2019-10-02 Thread Ido Schimmel
On Wed, Oct 02, 2019 at 08:34:22PM -0600, David Ahern wrote: > On 10/2/19 12:21 PM, Jiri Pirko wrote: > >>> This patch adds an "in hardware" indication to IPv4 routes, so that > >>> users will have better visibility into the offload process. In the > >>> future IPv6 will be extended with this indic

Re: [RFC PATCH net-next 00/15] Simplify IPv4 route offload API

2019-10-02 Thread Ido Schimmel
On Wed, Oct 02, 2019 at 08:17:59PM +0200, Jiri Pirko wrote: > Wed, Oct 02, 2019 at 10:40:48AM CEST, ido...@idosch.org wrote: > >From: Ido Schimmel > > > >Today, whenever an IPv4 route is added or deleted a notification is sent > >in the FIB notification chain and it is up to offload drivers to dec

Re: [RFC PATCH net-next 02/15] ipv4: Notify route after insertion to the routing table

2019-10-02 Thread Ido Schimmel
On Wed, Oct 02, 2019 at 07:34:35PM -0600, David Ahern wrote: > On 10/2/19 2:40 AM, Ido Schimmel wrote: > > @@ -1269,14 +1269,19 @@ int fib_table_insert(struct net *net, struct > > fib_table *tb, > > new_fa->tb_id = tb->tb_id; > > new_fa->fa_default = -1; > > > > - err = call_fib_entry_

[PATCH net-next v2] net/rds: Use DMA memory pool allocation for rds_header

2019-10-02 Thread Ka-Cheong Poon
Currently, RDS calls ib_dma_alloc_coherent() to allocate a large piece of contiguous DMA coherent memory to store struct rds_header for sending/receiving packets. The memory allocated is then partitioned into struct rds_header. This is not necessary and can be costly at times when memory is fragm

[PATCH net] tcp: fix slab-out-of-bounds in tcp_zerocopy_receive()

2019-10-02 Thread Eric Dumazet
Apparently a refactoring patch brought a bug, that was caught by syzbot [1] Original code was correct, do not try to be smarter than the compiler :/ [1] BUG: KASAN: slab-out-of-bounds in tcp_zerocopy_receive net/ipv4/tcp.c:1807 [inline] BUG: KASAN: slab-out-of-bounds in do_tcp_getsockopt.isra.0+

Re: [PATCH bpf-next 1/2] bpf/flow_dissector: add mode to enforce global BPF flow dissector

2019-10-02 Thread Andrii Nakryiko
On Wed, Oct 2, 2019 at 6:43 PM Stanislav Fomichev wrote: > > On 10/02, Andrii Nakryiko wrote: > > On Wed, Oct 2, 2019 at 10:35 AM Stanislav Fomichev wrote: > > > > > > Always use init_net flow dissector BPF program if it's attached and fall > > > back to the per-net namespace one. Also, deny inst

Re: [PATCH net v2] ipv6: Handle race in addrconf_dad_work

2019-10-02 Thread David Ahern
On 10/2/19 6:36 PM, Eric Dumazet wrote: > > > On 10/2/19 5:10 PM, David Ahern wrote: >> On 10/2/19 4:36 PM, Eric Dumazet wrote: >>> This might be related to a use of a bonding device, with a mlx4 slave. >>> >> >> does it only happen with bonds? > > All my hosts have bonds, some are just fine wit

Re: [RFC PATCH net-next 12/15] ipv4: Add "in hardware" indication to routes

2019-10-02 Thread David Ahern
On 10/2/19 12:21 PM, Jiri Pirko wrote: >>> This patch adds an "in hardware" indication to IPv4 routes, so that >>> users will have better visibility into the offload process. In the >>> future IPv6 will be extended with this indication as well. >>> >>> 'struct fib_alias' is extended with a new fiel

RE: [PATCH net-next] r8152: Add identifier names for function pointers

2019-10-02 Thread Hayes Wang
Prashant Malani [mailto:pmal...@chromium.org] > Sent: Thursday, October 03, 2019 5:10 AM > To: Hayes Wang > Cc: grund...@chromium.org; netdev@vger.kernel.org; nic_swsd; Prashant > Malani > Subject: [PATCH net-next] r8152: Add identifier names for function pointers > > Checkpatch throws warnings fo

Re: [PATCH bpf-next 1/2] bpf/flow_dissector: add mode to enforce global BPF flow dissector

2019-10-02 Thread Stanislav Fomichev
On 10/02, Andrii Nakryiko wrote: > On Wed, Oct 2, 2019 at 10:35 AM Stanislav Fomichev wrote: > > > > Always use init_net flow dissector BPF program if it's attached and fall > > back to the per-net namespace one. Also, deny installing new programs if > > there is already one attached to the root n

Re: [RFC PATCH net-next 02/15] ipv4: Notify route after insertion to the routing table

2019-10-02 Thread David Ahern
On 10/2/19 2:40 AM, Ido Schimmel wrote: > @@ -1269,14 +1269,19 @@ int fib_table_insert(struct net *net, struct > fib_table *tb, > new_fa->tb_id = tb->tb_id; > new_fa->fa_default = -1; > > - err = call_fib_entry_notifiers(net, event, key, plen, new_fa, extack); > + /* Insert n

Re: [patch net-next v2 11/15] netdevsim: implement proper devlink reload

2019-10-02 Thread Jakub Kicinski
On Wed, 2 Oct 2019 18:12:27 +0200, Jiri Pirko wrote: > From: Jiri Pirko > > During devlink reload, all driver objects should be reinstantiated with > the exception of devlink instance and devlink resources and params. > Move existing devlink_resource_size_get() calls into fib_create() just > bef

Re: [PATCH net v2] ipv6: Handle race in addrconf_dad_work

2019-10-02 Thread Eric Dumazet
On 10/2/19 5:10 PM, David Ahern wrote: > On 10/2/19 4:36 PM, Eric Dumazet wrote: >> This might be related to a use of a bonding device, with a mlx4 slave. >> > > does it only happen with bonds? All my hosts have bonds, some are just fine with your patch, but others are not. > > bond shows IF

Re: [RFC PATCH v2 00/45] Multipath TCP

2019-10-02 Thread Mat Martineau
On Wed, 2 Oct 2019, David Miller wrote: From: Mat Martineau Date: Wed, 2 Oct 2019 16:36:10 -0700 The MPTCP upstreaming community has prepared a net-next RFCv2 patch set for review. Nobody is going to read 45 patches and properly review them. And I do mean nobody. Please make smaller, m

Re: [PATCH net-next v4 0/2] net: stmmac: Enhanced addressing mode for DWMAC 4.10

2019-10-02 Thread David Miller
From: Thierry Reding Date: Wed, 2 Oct 2019 16:52:56 +0200 > From: Thierry Reding > > The DWMAC 4.10 supports the same enhanced addressing mode as later > generations. Parse this capability from the hardware feature registers > and set the EAME (Enhanced Addressing Mode Enable) bit when necessa

Re: [RFC PATCH v2 00/45] Multipath TCP

2019-10-02 Thread David Miller
From: Mat Martineau Date: Wed, 2 Oct 2019 16:36:10 -0700 > The MPTCP upstreaming community has prepared a net-next RFCv2 patch set > for review. Nobody is going to read 45 patches and properly review them. And I do mean nobody. Please make smaller, more reasonable (like 12-20 MAX), patch sets

Re: [PATCH net v2] ipv6: Handle race in addrconf_dad_work

2019-10-02 Thread David Ahern
On 10/2/19 4:36 PM, Eric Dumazet wrote: > This might be related to a use of a bonding device, with a mlx4 slave. > does it only happen with bonds? bond shows IF_READY even though the underlying device is carrier down which seems wrong; if a lower device is not carrier up then DAD does not really

Re: [RFC PATCH v2 00/45] Multipath TCP

2019-10-02 Thread Mat Martineau
On Wed, 2 Oct 2019, Mat Martineau wrote: The MPTCP upstreaming community has prepared a net-next RFCv2 patch set for review. Clone/fetch: https://github.com/multipath-tcp/mptcp_net-next.git (tag: netdev-rfcv2) Browse: https://github.com/multipath-tcp/mptcp_net-next/tree/netdev-rfcv2 Huge

[PATCH bpf-next 1/2] bpf, x86: Small optimization in comparing against imm0

2019-10-02 Thread Daniel Borkmann
Replace 'cmp reg, 0' with 'test reg, reg' for comparisons against zero. Saves 1 byte of instruction encoding per occurrence. The flag results of test 'reg, reg' are identical to 'cmp reg, 0' in all cases except for AF which we don't use/care about. In terms of macro-fusibility in combination with a

[PATCH bpf-next 2/2] bpf: Add loop test case with 32 bit reg comparison against 0

2019-10-02 Thread Daniel Borkmann
Add a loop test with 32 bit register against 0 immediate: # ./test_verifier 631 #631/p taken loop with back jump to 1st insn, 2 OK Disassembly: [...] 1b: test %edi,%edi 1d: jne0x0014 [...] Pretty much similar to prior "taken loop with back jump to 1st insn" tes

Re: [PATCH net v2] ipv6: Handle race in addrconf_dad_work

2019-10-02 Thread David Ahern
On 10/2/19 5:11 PM, Eric Dumazet wrote: > > > On 10/2/19 3:36 PM, Eric Dumazet wrote: >> >> >> On 10/2/19 3:33 PM, David Ahern wrote: > >>> >>> I flipped to IF_READY based on addrconf_ifdown and idev checks seeming >>> more appropriate. >>> >> > > Note that IF_READY is set in ipv6_add_dev() if

[RFC PATCH v2 16/45] tcp: clean ext on tx recycle

2019-10-02 Thread Mat Martineau
From: Paolo Abeni Otherwise we will find stray/unexpected/old extensions value on next iteration. On tcp_write_xmit() we can end-up splitting an already queued skb in two parts, via tso_fragment(). The newly created skb can be allocated via the tx cache and the mptcp stack will not be aware of i

[RFC PATCH v2 33/45] mptcp: Implement path manager interface commands

2019-10-02 Thread Mat Martineau
From: Peter Krystad Use the addr_signal flag to indicate to the subflow layer that a local address may be announced, and call subflow_connect() to initiate a secondary subflow. Signed-off-by: Peter Krystad Signed-off-by: Florian Westphal --- net/mptcp/pm.c | 63 +++

[RFC PATCH v2 28/45] mptcp: Add handling of incoming MP_JOIN requests

2019-10-02 Thread Mat Martineau
From: Peter Krystad Process the MP_JOIN option in a SYN packet with the same flow as MP_CAPABLE but when the third ACK is received add the subflow to the MPTCP socket subflow list instead of adding it to the TCP socket accept queue. The subflow is added at the end of the subflow list so it will

[RFC PATCH v2 35/45] mptcp: update per unacked sequence on pkt reception

2019-10-02 Thread Mat Martineau
From: Paolo Abeni So that we keep per unacked sequence number consistent; since we update per msk data, use an atomic64 cmpxcgh() to protect against concurrent updates from multiple subflows. Initialize the snd_una at connect()/accept() time. Signed-off-by: Paolo Abeni --- net/mptcp/options.c

[RFC PATCH v2 30/45] mptcp: new sysctl to control the activation per NS

2019-10-02 Thread Mat Martineau
From: Matthieu Baerts New MPTCP sockets will return -ENOPROTOOPT if MPTCP support is disabled for the current net namespace. For security reasons, it is interesting to have a global switch for MPTCP. To start, MPTCP will be disabled by default and only privileged users will be able to modify thi

[PATCH net-next] net: dsa: Allow port mirroring to the CPU port

2019-10-02 Thread Vladimir Oltean
On a regular netdev, putting it in promiscuous mode means receiving all traffic passing through it, whether or not it was destined to its MAC address. Then monitoring applications such as tcpdump can see all traffic transiting it. On Ethernet switches, clearly all ports are in promiscuous mode by

[RFC PATCH v2 10/45] mptcp: add mptcp_poll

2019-10-02 Thread Mat Martineau
From: Florian Westphal Can't use tcp_poll directly: BUG: KASAN: slab-out-of-bounds in tcp_poll+0x17f/0x540 Read of size 4 at addr 88806ac5e50c by task mptcp_connect/2085 Call Trace: tcp_poll+0x17f/0x540 sock_poll+0x152/0x180 Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 31

[RFC PATCH v2 37/45] mptcp: introduce MPTCP retransmission timer

2019-10-02 Thread Mat Martineau
From: Paolo Abeni The timer will be used to schedule retransmission. It's frequency is based on the current subflow RTO estimation and is reset on every una_seq update The timer is clearer for good by __mptcp_clear_xmit() Also clean MPTCP rtx queue before each transmission Signed-off-by: Paolo

[RFC PATCH v2 34/45] mptcp: Make MPTCP socket block/wakeup ignore sk_receive_queue

2019-10-02 Thread Mat Martineau
The MPTCP-level socket doesn't use sk_receive_queue, so it was possible for mptcp_recvmsg() to remain blocked when there was data ready for it to read. When the MPTCP socket is waiting for additional data and it releases the subflow socket lock, the subflow may have incoming packets ready to proces

[RFC PATCH v2 17/45] mptcp: Add MPTCP to skb extensions

2019-10-02 Thread Mat Martineau
Add enum value for MPTCP and update config dependencies Signed-off-by: Mat Martineau --- include/linux/skbuff.h | 3 +++ include/net/mptcp.h| 16 net/core/skbuff.c | 7 +++ net/mptcp/Kconfig | 1 + 4 files changed, 27 insertions(+) diff --git a/include/linu

[RFC PATCH v2 29/45] mptcp: harmonize locking on all socket operations.

2019-10-02 Thread Mat Martineau
From: Paolo Abeni The locking schema implied by sendmsg(), recvmsg(), etc. requires acquiring the msk's socket lock before manipulating the msk internal status. Additionally, we can't acquire the msk->subflow socket lock while holding the msk lock, due to mptcp_finish_connect(). Many socket ope

[RFC PATCH v2 09/45] mptcp: Handle MP_CAPABLE options for outgoing connections

2019-10-02 Thread Mat Martineau
From: Peter Krystad Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request for a subflow socket and to the final ACK of the three-way handshake. Use the .sk_rx_dst_set() handler in the subflow proto to capture when the responding SYN-ACK is received and notify the MPTCP connectio

[RFC PATCH v2 07/45] mptcp: Associate MPTCP context with TCP socket

2019-10-02 Thread Mat Martineau
From: Peter Krystad Use ULP to associate a subflow_context structure with each TCP subflow socket. Signed-off-by: Peter Krystad Signed-off-by: Florian Westphal Signed-off-by: Matthieu Baerts --- include/linux/tcp.h | 3 ++ net/mptcp/Makefile | 2 +- net/mptcp/protocol.c | 51

[RFC PATCH v2 40/45] mptcp: implement and use MPTCP-level retransmission

2019-10-02 Thread Mat Martineau
From: Paolo Abeni On timeout event, schedule a work queue to do the retransmission. Retransmission code resemple closely sendmsg() implementation and re-uses mptcp_sendmsg_frag, providing a dummy msghdr - for flags' sake - and peeking the relevant dfrag from the rtx head. Signed-off-by: Paolo Ab

[RFC PATCH v2 26/45] mptcp: Add path manager interface

2019-10-02 Thread Mat Martineau
From: Peter Krystad Add enough of a path manager interface to allow sending of ADD_ADDR when an incoming MPTCP connection is created. Capable of sending only a single IPv4 ADD_ADDR option. The 'pm_data' element of the connection sock will need to be expanded to handle multiple interfaces and IPv6

[RFC PATCH v2 36/45] mptcp: queue data for mptcp level retransmission

2019-10-02 Thread Mat Martineau
From: Paolo Abeni keep the send page fragment on an MPTCP level retransmission queue. the queue entries are allocated inside the page frag allocator, acquiring an additional reference to the page for each list entry. Also switch to a custom page frag refill function, to ensure that the current p

[RFC PATCH v2 24/45] mptcp: allow collapsing consecutive sendpages on the same substream

2019-10-02 Thread Mat Martineau
From: Paolo Abeni If the current sendmsg() lands on the same subflow we used last, we can try to collapse the data. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 74 +++- 1 file changed, 59 insertions(+), 15 deletions(-) diff --git a/net/mptcp/p

[RFC PATCH v2 21/45] mptcp: Implement MPTCP receive path

2019-10-02 Thread Mat Martineau
Parses incoming DSS options and populates outgoing MPTCP ACK fields. MPTCP fields are parsed from the TCP option header and placed in an skb extension, allowing the upper MPTCP layer to access MPTCP options after the skb has gone through the TCP stack. Outgoing MPTCP ACK values are now populated f

[RFC PATCH v2 11/45] tcp, ulp: Add clone operation to tcp_ulp_ops

2019-10-02 Thread Mat Martineau
If ULP is used on a listening socket, icsk_ulp_ops and icsk_ulp_data are copied when the listener is cloned. Sometimes the clone is immediately deleted, which will invoke the release op on the clone and likely corrupt the listening socket's icsk_ulp_data. The clone operation is invoked immediately

[RFC PATCH v2 04/45] tcp: Define IPPROTO_MPTCP

2019-10-02 Thread Mat Martineau
To open a MPTCP socket with socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP), IPPROTO_MPTCP needs a value that differs from IPPROTO_TCP. The existing IPPROTO numbers mostly map directly to IANA-specified protocol numbers. MPTCP does not have a protocol number allocated because MPTCP packets use the TCP

[RFC PATCH v2 06/45] mptcp: Handle MPTCP TCP options

2019-10-02 Thread Mat Martineau
From: Peter Krystad Currently only MPTCP v0 is supported so ignore v1 MP_CAPABLE option. Signed-off-by: Peter Krystad Signed-off-by: Matthieu Baerts Signed-off-by: Florian Westphal --- include/linux/tcp.h | 15 include/net/mptcp.h | 17 + net/ipv4/tcp_input.c | 5 ++ net/ip

[RFC PATCH v2 01/45] tcp: Add MPTCP option number

2019-10-02 Thread Mat Martineau
TCP option 30 is allocated for MPTCP by the IANA. Signed-off-by: Mat Martineau --- include/net/tcp.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/net/tcp.h b/include/net/tcp.h index c9a3f9688223..382e245a7909 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -182,6 +182,7

[RFC PATCH v2 42/45] selftests: mptcp: extend mptcp_connect tool for ipv6 family

2019-10-02 Thread Mat Martineau
From: Florian Westphal At this time socket() will fail when requesting an ipv6 mptcp socket. This is ok for now, as the test script won't request ipv6 tests yet. Tested with a tcp <-> tcp connection. Signed-off-by: Florian Westphal --- .../testing/selftests/net/mptcp/mptcp_connect.c | 17

[RFC PATCH v2 45/45] selftests: mptcp: random ethtool tweaking

2019-10-02 Thread Mat Martineau
From: Florian Westphal Instead of unconditionally disabling TSO in ns3, turn off any of gso/tso/gro in ns3 and/or ns4. This gets us various combinations of GRO/GSO/TSO without a large impact on test time. Signed-off-by: Florian Westphal --- .../selftests/net/mptcp/mptcp_connect.sh | 29 +

[RFC PATCH v2 38/45] mptcp: implement memory accounting for mptcp rtx queue

2019-10-02 Thread Mat Martineau
From: Paolo Abeni Charge the data on the rtx queue to the master MPTCP socket, too. Such memory in uncharged when the data is acked/dequeued. Also account mptcp sockets inuse via a protocol specific pcpu counter. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 29 +++

[RFC PATCH v2 39/45] mptcp: rework mptcp_sendmsg_frag to accept optional dfrag

2019-10-02 Thread Mat Martineau
From: Paolo Abeni This will simplify mptcp-level retransmission implementation in the next patch. If dfrag is provided by the caller, skip kernel space memory allocation and use data and metadata provided by the dfrag itself. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 133 ++

[RFC PATCH v2 22/45] mptcp: use sk_page_frag() in sendmsg

2019-10-02 Thread Mat Martineau
From: Paolo Abeni This clean-up a bit the send path, and allows better performances. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index

[RFC PATCH v2 44/45] selftests: mptcp: add ipv6 connectivity

2019-10-02 Thread Mat Martineau
From: Florian Westphal prepare for ipv6 mptcp tests. Once someone starts to implement mptcp v6 support, just set ipv6=true in the script and the selftest will attempt to connect via ipv6. Signed-off-by: Florian Westphal --- .../selftests/net/mptcp/mptcp_connect.sh | 77 ---

[RFC PATCH v2 41/45] selftests: mptcp: make tc delays random

2019-10-02 Thread Mat Martineau
From: Florian Westphal test with less predictable setups: tc qdisc delay is now random, same for reordering and loss. Main motivation is to cover more scenarious without a large increase in test-time. Signed-off-by: Florian Westphal --- .../selftests/net/mptcp/mptcp_connect.sh | 28 ++

[RFC PATCH v2 43/45] selftests: mptcp: add accept/getpeer checks

2019-10-02 Thread Mat Martineau
From: Florian Westphal Check that the result coming from accept matches that of getpeername. For initiator side, check getpeername matches address passed to connect(). At this time, kernel returns the address of the first subflow in the list. Right now, we do not yet implement fate-sharing, i.e.

[RFC PATCH v2 20/45] mptcp: Write MPTCP DSS headers to outgoing data packets

2019-10-02 Thread Mat Martineau
Per-packet metadata required to write the MPTCP DSS option is written to the skb_ext area. One write to the socket may contain more than one packet of data, in which case the DSS option in the first packet will have a mapping covering all of the data in that write. Packets after the first do not ha

[RFC PATCH v2 32/45] mptcp: Add handling of outgoing MP_JOIN requests

2019-10-02 Thread Mat Martineau
From: Peter Krystad Subflow creation may be initiated by the path manager when the primary connection is fully established and a remote address has been received via ADD_ADDR. Create an in-kernel sock and use kernel_connect() to initiate connection. When a valid SYN-ACK is received the new sock

[RFC PATCH v2 05/45] mptcp: Add MPTCP socket stubs

2019-10-02 Thread Mat Martineau
Implements the infrastructure for MPTCP sockets. MPTCP sockets open one in-kernel TCP socket per subflow. These subflow sockets are only managed by the MPTCP socket that owns them and are not visible from userspace. This commit allows a userspace program to open an MPTCP socket with: sock = soc

[RFC PATCH v2 12/45] mptcp: Create SUBFLOW socket for incoming connections

2019-10-02 Thread Mat Martineau
From: Peter Krystad Add subflow_request_sock type that extends tcp_request_sock and add an is_mptcp flag to tcp_request_sock distinguish them. Override the listen() and accept() methods of the MPTCP socket proto_ops so they may act on the subflow socket. Override the conn_request() and syn_recv

[RFC PATCH v2 14/45] mptcp: Add shutdown() socket operation

2019-10-02 Thread Mat Martineau
From: Peter Krystad Call shutdown on all subflows in use on the given socket, or on the fallback socket. Signed-off-by: Peter Krystad --- net/mptcp/protocol.c | 32 1 file changed, 32 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index

[RFC PATCH v2 02/45] net: Make sock protocol value checks more specific

2019-10-02 Thread Mat Martineau
SK_PROTOCOL_MAX is only used in two places, for DECNet and AX.25. The limits have more to do with the those protocol definitions than they do with the data type of sk_protocol, so remove SK_PROTOCOL_MAX and use U8_MAX directly. Signed-off-by: Mat Martineau --- include/net/sock.h | 1 - net/a

[RFC PATCH v2 08/45] tcp: Expose tcp struct and routine for MPTCP

2019-10-02 Thread Mat Martineau
From: Peter Krystad tcp_request_sock_ipv4_ops and tcp_v4_init_sock(). This function is needed for MPTCP subflow initialization. Signed-off-by: Peter Krystad --- include/net/tcp.h | 3 +++ net/ipv4/tcp_ipv4.c | 4 ++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/ne

[RFC PATCH v2 23/45] mptcp: sendmsg() do spool all the provided data

2019-10-02 Thread Mat Martineau
From: Paolo Abeni This makes mptcp sendmsg() behaviour more consistent and improves xmit performances. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 126 --- 1 file changed, 71 insertions(+), 55 deletions(-) diff --git a/net/mptcp/protocol.c b/n

[RFC PATCH v2 31/45] mptcp: add basic kselftest for mptcp

2019-10-02 Thread Mat Martineau
From: Florian Westphal Add mpcp_connect tool: xmit two files back and forth between two processes. Wrapper script tests that data was transmitted without corruption. The "-c" command line option for mptcp_connect.sh is there for debugging: The script will use tcpdump to create one .pcap file pe

[RFC PATCH v2 27/45] mptcp: Add ADD_ADDR handling

2019-10-02 Thread Mat Martineau
From: Peter Krystad Add handling for sending and receiving the ADD_ADDR, ADD_ADDR6, and RM_ADDR suboptions. Signed-off-by: Peter Krystad --- include/linux/tcp.h | 2 + include/net/mptcp.h | 7 +++ net/mptcp/options.c | 115 +++ net/mptcp/protocol.

[RFC PATCH v2 25/45] tcp: Check for filled TCP option space before SACK

2019-10-02 Thread Mat Martineau
The SACK code would potentially add four bytes to the expected TCP option size even if all option space was already used. Signed-off-by: Mat Martineau --- net/ipv4/tcp_output.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 3de804531231

[RFC PATCH v2 13/45] mptcp: Add key generation and token tree

2019-10-02 Thread Mat Martineau
From: Peter Krystad Generate the local keys, IDSN, and token when creating a new socket. Introduce the token tree to track all tokens in use using a radix tree with the MPTCP token itself as the index. Will be used to obtain the MPTCP parent socket to handle incoming joins. Signed-off-by: Peter

[RFC PATCH v2 18/45] tcp: Prevent coalesce/collapse when skb has MPTCP extensions

2019-10-02 Thread Mat Martineau
The MPTCP extension data needs to be preserved as it passes through the TCP stack. Make sure that these skbs are not appended to others during coalesce or collapse, so the data remains associated with the payload of the given skb. Signed-off-by: Mat Martineau --- include/net/mptcp.h | 10 +

[RFC PATCH v2 15/45] mptcp: Add setsockopt()/getsockopt() socket operations

2019-10-02 Thread Mat Martineau
From: Peter Krystad set/getsockopt behaviour with multiple subflows is undefined. Therefore, for now, we return -EOPNOTSUPP unless we're in fallback mode. Signed-off-by: Peter Krystad Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 67 1 file

[RFC PATCH v2 19/45] tcp: Export low-level TCP functions

2019-10-02 Thread Mat Martineau
MPTCP will make use of tcp_send_mss() and tcp_push() when sending data to specific TCP subflows. Signed-off-by: Mat Martineau --- include/net/tcp.h | 3 +++ net/ipv4/tcp.c| 6 +++--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index dd

[RFC PATCH v2 00/45] Multipath TCP

2019-10-02 Thread Mat Martineau
The MPTCP upstreaming community has prepared a net-next RFCv2 patch set for review. Clone/fetch: https://github.com/multipath-tcp/mptcp_net-next.git (tag: netdev-rfcv2) Browse: https://github.com/multipath-tcp/mptcp_net-next/tree/netdev-rfcv2 With CONFIG_MPTCP=y, a socket created with IPPROTO_M

[RFC PATCH v2 03/45] sock: Make sk_protocol a 16-bit value

2019-10-02 Thread Mat Martineau
Match the 16-bit width of skbuff->protocol. Fills an 8-bit hole so sizeof(struct sock) does not change. Signed-off-by: Mat Martineau --- include/net/sock.h | 4 ++-- include/trace/events/sock.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/net/sock.h b/i

[PATCH net-next] net: dsa: sja1105: Add support for port mirroring

2019-10-02 Thread Vladimir Oltean
Amazingly, of all features, this does not require a switch reset. Tested with: tc qdisc add dev swp2 clsact tc filter add dev swp2 ingress matchall skip_sw \ action mirred egress mirror dev swp3 tc filter show dev swp2 ingress tc filter del dev swp2 ingress pref 49152 Signed-off-by: Vlad

Re: [PATCH bpf-next 1/2] bpf/flow_dissector: add mode to enforce global BPF flow dissector

2019-10-02 Thread Andrii Nakryiko
On Wed, Oct 2, 2019 at 10:35 AM Stanislav Fomichev wrote: > > Always use init_net flow dissector BPF program if it's attached and fall > back to the per-net namespace one. Also, deny installing new programs if > there is already one attached to the root namespace. > Users can still detach their BP

Re: [PATCH net-next v2] net/rds: Log vendor error if send/recv Work requests fail

2019-10-02 Thread David Miller
From: Sudhakar Dindukurti Date: Tue, 1 Oct 2019 16:33:14 -0700 > Log vendor error if work requests fail. Vendor error provides > more information that is used for debugging the issue. > > Signed-off-by: Sudhakar Dindukurti > Acked-by: Santosh Shilimkar Applied, thanks.

[RFC net-next 0/2] prevent sync issues with hw offload of flower

2019-10-02 Thread John Hurley
Hi, Putting this out an RFC built on net-next. It fixes some issues discovered in testing when using the TC API of OvS to generate flower rules and subsequently offloading them to HW. Rules seen contain the same match fields or may be rule modifications run as a delete plus an add. We're seeing ra

[RFC net-next 2/2] net: sched: fix tp destroy race conditions in flower

2019-10-02 Thread John Hurley
Flower has rule HW offload functions available that drivers can choose to register for. For the deletion case, these are triggered after filters have been removed from lookup tables both at the flower level, and the higher cls_api level. With flower running without RTNL locking, this can lead to ra

[RFC net-next 1/2] net: sched: add tp_op for pre_destroy

2019-10-02 Thread John Hurley
It is possible that a race condition may exist when a tcf_proto is destroyed. Several actions occur before the destroy() tcf_proto_op is called so if no higher level locking (e.g. RTNL) is in use then other rules may be received and processed in parallel before the classifier's specific destroy fun

Re: [PATCH] tcp: add tsval and tsecr to TCP_INFO

2019-10-02 Thread Eric Dumazet
On 10/2/19 3:54 PM, William Dauchy wrote: > Hello Eric, > > On Thu, Oct 3, 2019 at 12:33 AM Eric Dumazet wrote: >> On 10/2/19 3:10 PM, William Dauchy wrote: >> Reporting the last recorded values is really not good, >> a packet capture will give you all this information in a non >> racy way. >

Re: [PATCH bpf] samples/bpf: kbuild: add CONFIG_SAMPLE_BPF Kconfig

2019-10-02 Thread Ivan Khoronzhuk
On Wed, Oct 02, 2019 at 09:41:15AM +0200, Björn Töpel wrote: On Wed, 2 Oct 2019 at 03:49, Masahiro Yamada wrote: [...] > Yes, the BPF samples require clang/LLVM with BPF support to build. Any > suggestion on a good way to address this (missing tools), better than > the warning above? After t

Re: [PATCH net v2] ipv6: Handle race in addrconf_dad_work

2019-10-02 Thread Eric Dumazet
On 10/2/19 3:36 PM, Eric Dumazet wrote: > > > On 10/2/19 3:33 PM, David Ahern wrote: >> >> I flipped to IF_READY based on addrconf_ifdown and idev checks seeming >> more appropriate. >> > Note that IF_READY is set in ipv6_add_dev() if all these conditions are true : if (netif_running(dev)

Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF

2019-10-02 Thread Steven Rostedt
On Wed, 2 Oct 2019 17:18:21 + Alexei Starovoitov wrote: > >> It's an interesting idea, but I don't think it can work. > >> Please see bpf_trace_printk implementation in kernel/trace/bpf_trace.c > >> It's a lot more than string printing. > > > > Well, trace_printk() is just string printing.

Re: [PATCH] tcp: add tsval and tsecr to TCP_INFO

2019-10-02 Thread William Dauchy
Hello Eric, On Thu, Oct 3, 2019 at 12:33 AM Eric Dumazet wrote: > On 10/2/19 3:10 PM, William Dauchy wrote: > Reporting the last recorded values is really not good, > a packet capture will give you all this information in a non > racy way. Thank you for your quick answer. In my use case I use it

Re: [PATCH net v2] ipv6: Handle race in addrconf_dad_work

2019-10-02 Thread Eric Dumazet
On 10/2/19 3:33 PM, David Ahern wrote: > On 10/2/19 4:21 PM, Eric Dumazet wrote: >> o syzbot this time, but complete lack of connectivity on some of my test >> hosts. >> >> Incoming IPv6 packets go to ip6_forward() (!!!) and are dropped there. > > what does 'ip -6 addr sh' show when it is in t

Re: [PATCH net v2] ipv6: Handle race in addrconf_dad_work

2019-10-02 Thread David Ahern
On 10/2/19 4:21 PM, Eric Dumazet wrote: > o syzbot this time, but complete lack of connectivity on some of my test > hosts. > > Incoming IPv6 packets go to ip6_forward() (!!!) and are dropped there. what does 'ip -6 addr sh' show when it is in this state? Any idea of the order of events? > > T

Re: [PATCH] tcp: add tsval and tsecr to TCP_INFO

2019-10-02 Thread Eric Dumazet
On 10/2/19 3:10 PM, William Dauchy wrote: > tsval and tsecr are useful in some cases to diagnose TCP issues from the > sender point of view where unexplained RTT values are seen. Getting the > the timestamps from both ends will help understand those issues more > easily. > Reporting the last r

Re: [PATCH net v2] ipv6: Handle race in addrconf_dad_work

2019-10-02 Thread Eric Dumazet
On 10/2/19 3:13 PM, David Ahern wrote: > On 10/2/19 3:23 PM, Eric Dumazet wrote: >> >> >> On 10/2/19 2:08 PM, Eric Dumazet wrote: >>> >>> >>> On 10/1/19 11:18 AM, Eric Dumazet wrote: On 9/30/19 8:28 PM, David Ahern wrote: > From: David Ahern > > Rajendra reported a ke

Re: [PATCH net v2] ipv6: Handle race in addrconf_dad_work

2019-10-02 Thread David Ahern
On 10/2/19 3:23 PM, Eric Dumazet wrote: > > > On 10/2/19 2:08 PM, Eric Dumazet wrote: >> >> >> On 10/1/19 11:18 AM, Eric Dumazet wrote: >>> >>> >>> On 9/30/19 8:28 PM, David Ahern wrote: From: David Ahern Rajendra reported a kernel panic when a link was taken down: [ 687

[PATCH] tcp: add tsval and tsecr to TCP_INFO

2019-10-02 Thread William Dauchy
tsval and tsecr are useful in some cases to diagnose TCP issues from the sender point of view where unexplained RTT values are seen. Getting the the timestamps from both ends will help understand those issues more easily. Signed-off-by: William Dauchy --- include/uapi/linux/tcp.h | 3 +++ net/ip

Re: [PATCH 1/3] docs: fix some broken references

2019-10-02 Thread Paul Walmsley
On Tue, 24 Sep 2019, Mauro Carvalho Chehab wrote: > There are a number of documentation files that got moved or > renamed. update their references. > > Signed-off-by: Mauro Carvalho Chehab > --- > Documentation/devicetree/bindings/cpu/cpu-topology.txt| 2 +- > Documentation/devicetree/bindi

Re: [PATCH net] ipv6: drop incoming packets having a v4mapped source address

2019-10-02 Thread Florian Westphal
Eric Dumazet wrote: > > > @@ -223,6 +223,16 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff > > > *skb, struct net_device *dev, > > > if (ipv6_addr_is_multicast(&hdr->saddr)) > > > goto err; > > > > > > + /* While RFC4291 is not explicit about v4mapped addresses > >

[PATCH] rt2x00: remove input-polldev.h header

2019-10-02 Thread Dmitry Torokhov
The driver does not use input subsystem so we do not need this header, and it is being removed, so stop pulling it in. Signed-off-by: Dmitry Torokhov --- drivers/net/wireless/ralink/rt2x00/rt2x00.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00.h b/

[PATCH v2 bpf-next 1/7] selftests/bpf: undo GCC-specific bpf_helpers.h changes

2019-10-02 Thread Andrii Nakryiko
Having GCC provide its own bpf-helper.h is not the right approach and is going to be changed. Undo bpf_helpers.h change before moving bpf_helpers.h into libbpf. Acked-by: Song Liu Acked-by: Ilya Leoshkevich Acked-by: John Fastabend Signed-off-by: Andrii Nakryiko --- tools/testing/selftests/bp

[PATCH v2 bpf-next 5/7] libbpf: move bpf_{helpers,endian,tracing}.h into libbpf

2019-10-02 Thread Andrii Nakryiko
Move bpf_helpers.h, bpf_tracing.h, and bpf_endian.h into libbpf. Ensure they are installed along the other libbpf headers. Also, adjust selftests and samples include path to include libbpf now. Signed-off-by: Andrii Nakryiko --- samples/bpf/Makefile | 2 +- tools/li

[PATCH v2 bpf-next 3/7] selftests/bpf: adjust CO-RE reloc tests for new bpf_core_read() macro

2019-10-02 Thread Andrii Nakryiko
To allow adding a variadic BPF_CORE_READ macro with slightly different syntax and semantics, define CORE_READ in CO-RE reloc tests, which is a thin wrapper around low-level bpf_core_read() macro, which in turn is just a wrapper around bpf_probe_read(). Acked-by: John Fastabend Acked-by: Song Liu

[PATCH v2 bpf-next 4/7] selftests/bpf: split off tracing-only helpers into bpf_tracing.h

2019-10-02 Thread Andrii Nakryiko
Split-off PT_REGS-related helpers into bpf_tracing.h header. Adjust selftests and samples to include it where necessary. Signed-off-by: Andrii Nakryiko --- samples/bpf/map_perf_test_kern.c | 1 + samples/bpf/offwaketime_kern.c| 1 + samples/bpf/sampleip_kern.c

[PATCH v2 bpf-next 7/7] selftests/bpf: add BPF_CORE_READ and BPF_CORE_READ_STR_INTO macro tests

2019-10-02 Thread Andrii Nakryiko
Validate BPF_CORE_READ correctness and handling of up to 9 levels of nestedness using cyclic task->(group_leader->)*->tgid chains. Also add a test of maximum-dpeth BPF_CORE_READ_STR_INTO() macro. Acked-by: John Fastabend Acked-by: Song Liu Signed-off-by: Andrii Nakryiko --- .../selftests/bpf/

[PATCH v2 bpf-next 2/7] selftests/bpf: samples/bpf: split off legacy stuff from bpf_helpers.h

2019-10-02 Thread Andrii Nakryiko
Split off few legacy things from bpf_helpers.h into separate bpf_legacy.h file: - load_{byte|half|word}; - remove extra inner_idx and numa_node fields from bpf_map_def and introduce bpf_map_def_legacy for use in samples; - move BPF_ANNOTATE_KV_PAIR into bpf_legacy.h. Adjust samples and selftests

  1   2   3   >