Avoid extra step of setting limit from cmdline and do it directly in
the program.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
samples/sockmap/sockmap_user.c |7 +++
1 file changed, 7 insertions(+)
diff --git a/samples/sockmap/sockmap_user.c b/samples/s
-by: John Fastabend <john.fastab...@gmail.com>
---
samples/sockmap/sockmap_user.c |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index bae85f8..0d8950f 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/s
Report bytes/sec sent as well as total bytes. Useful to get rough
idea how different configurations and usage patterns perform with
sockmap.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
samples/sockmap/sockmap_user.c | 37 -
1 file c
get many GBps of data which helps exercise the
sockmap code.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
samples/sockmap/sockmap_user. |0
samples/sockmap/sockmap_user.c | 55
2 files changed, 39 insertions(+), 16 del
Add a base test that does not use BPF hooks to test baseline case.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
samples/sockmap/sockmap_user.c | 26 +-
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/samples/sockmap/sockmap_us
supported, but more can be added as
needed.
The new help argument gives the following,
Usage: ./sockmap --cgroup
options:
--help -h
--cgroup -c
--rate -r
--verbose -v
--iov_count-i
--length -l
--test -t
Signed-off-by: John Fastabend <j
, the reporting could be better, etc. But,
IMO lets push this now rather than sit on it for weeks until I get
time to do the above improvements.
---
John Fastabend (7):
bpf: refactor sockmap sample program update for arg parsing
bpf: add sendmsg option for testing BPF programs
bpf
in the future.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
samples/sockmap/sockmap_user.c | 142 +---
1 file changed, 103 insertions(+), 39 deletions(-)
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 7cc9d22..5
ff-by: Alexei Starovoitov <a...@kernel.org>
> ---
LGTM, I'll drop it on my test systems and start running with it.
Although I don't have any Variant 1 code to test, but seems that
is being covered by others.
Thanks!
Acked-by: John Fastabend <john.fastab...@gmail.com>
On 01/03/2018 02:25 AM, Jesper Dangaard Brouer wrote:
> This patch only introduce the core data structures and API functions.
> All XDP enabled drivers must use the API before this info can used.
>
> There is a need for XDP to know more about the RX-queue a given XDP
> frames have arrived on.
> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com>
> Acked-by: Alexei Starovoitov <a...@kernel.org>
> ---
LGTM
Acked-by: John Fastabend <john.fastab...@gmail.com>
Cc: intel-wired-...@lists.osuosl.org
> Cc: Björn Töpel <bjorn.to...@intel.com>
> Cc: Jeff Kirsher <jeffrey.t.kirs...@intel.com>
> Cc: Paul Menzel <pmen...@molgen.mpg.de>
> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com>
> Reviewed-by: Paul Menzel <pmen...@molgen.mpg.de>
> ---
Same here. LGTM.
Acked-by: John Fastabend <john.fastab...@gmail.com>
>
> Cc: intel-wired-...@lists.osuosl.org
> Cc: Jeff Kirsher <jeffrey.t.kirs...@intel.com>
> Cc: Alexander Duyck <alexander.du...@gmail.com>
> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com>
> ---
Looked a bit for reset paths that might be missed but didn
and
not obvious in my opinion.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
kernel/bpf/sockmap.c | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 5ee2e41..1712d31 100644
--- a/kernel/bpf/sockmap.c
On 01/03/2018 03:41 PM, Cong Wang wrote:
> On Wed, Jan 3, 2018 at 10:09 AM, John Fastabend
> <john.fastab...@gmail.com> wrote:
>> On 01/02/2018 08:41 PM, Cong Wang wrote:
>>> Hi, John
>>>
>>> While reviewing your ptr_ring fix again today, it looks like
This was added for some work that was eventually factored out but the
helper call was missed. Remove it now and add it back later if needed.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
kernel/bpf/sockmap.c |8
1 file changed, 8 deletions(-)
diff --git a/kern
The sockmap infrastructure is only aware of TCP sockets at the
moment. In the future we plan to add UDP. In both cases CONFIG_NET
should be built-in.
So lets only build sockmap if CONFIG_INET is enabled.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/linux
On 01/02/2018 08:41 PM, Cong Wang wrote:
> Hi, John
>
> While reviewing your ptr_ring fix again today, it looks like your
> "lockless" qdisc patchset breaks dev->tx_queue_len behavior.
>
> Before your patchset, dev->tx_queue_len is merely an integer to read,
&g
On 01/03/2018 07:50 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 02, 2018 at 04:25:03PM -0800, John Fastabend wrote:
>>>
>>> More generally, what makes this usage safe?
>>> Is there a way to formalize it at the API level?
>>>
>>
>> Right I think
and
not obvious in my opinion.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
kernel/bpf/sockmap.c |7 +++
1 file changed, 7 insertions(+)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 5ee2e41..dfbbde2 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/soc
On 01/02/2018 03:12 PM, Michael S. Tsirkin wrote:
> On Tue, Jan 02, 2018 at 01:27:23PM -0800, John Fastabend wrote:
>> On 01/02/2018 09:17 AM, Michael S. Tsirkin wrote:
>>> On Tue, Jan 02, 2018 at 07:01:33PM +0200, Michael S. Tsirkin wrote:
>>>> On Tue, Jan 02, 2
On 01/02/2018 10:49 AM, David Miller wrote:
> From: Wei Yongjun
> Date: Wed, 27 Dec 2017 17:05:52 +0800
>
>> When dev_requeue_skb() is called with bluked skb list, only the
> ^^
>
> "bulked"
>
>> first skb of the list will be
On 01/02/2018 09:17 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 02, 2018 at 07:01:33PM +0200, Michael S. Tsirkin wrote:
>> On Tue, Jan 02, 2018 at 11:52:19AM -0500, David Miller wrote:
>>> From: John Fastabend <john.fastab...@gmail.com>
>>> Date: Wed, 27 Dec
mal case checks would suffer some so best to just
allocate an extra pointer.
Reported-by: Jakub Kicinski <jakub.kicin...@netronome.com>
Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/linux/ptr
On 12/27/2017 10:29 AM, Cong Wang wrote:
> On Sat, Dec 23, 2017 at 10:57 PM, John Fastabend
> <john.fastab...@gmail.com> wrote:
>> On 12/22/2017 12:31 PM, Cong Wang wrote:
>>> I understand why you had it, but it is just not safe. You don't want
>>> to achieve
On 12/24/2017 07:49 PM, Wei Yongjun wrote:
> When dev_requeue_skb() is called with bluked skb list, only the
> first skb of the list will be requeued to qdisc layer, and leak
> the others without free them.
>
> TCP is broken due to skb leak since no free skb will be considered
> as still in the
On 12/22/2017 12:31 PM, Cong Wang wrote:
> On Thu, Dec 21, 2017 at 7:06 PM, John Fastabend
> <john.fastab...@gmail.com> wrote:
>> On 12/21/2017 04:03 PM, Cong Wang wrote:
>>> __skb_array_empty() is only safe if array is never resized.
>>> pfifo_fast_dequeue() is
at doesn't help. And it is only a
local_bh_disable.
> Fixes: 7bbde83b1860 ("net: sched: drop qdisc_reset from dev_graft_qdisc")
> Reported-by: Jakub Kicinski <jakub.kicin...@netronome.com>
> Cc: John Fastabend <john.fastab...@gmail.com>
> Signed-off-by: Cong Wang <xiyou.
s: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
> Reported-by: Jakub Kicinski <jakub.kicin...@netronome.com>
> Cc: John Fastabend <john.fastab...@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>
> ---
> net/sched/sch_generic.c | 3 -
qdisc_qstats_cpu_backlog_dec(qdisc, skb);
>>631 qdisc_bstats_cpu_update(qdisc, skb);
>>632 qdisc_qstats_cpu_qlen_dec(qdisc);
>>633 }
>>634
>>635 return skb;
>>636 }
>
> Yea
On 12/20/2017 01:59 PM, Jakub Kicinski wrote:
> On Wed, 20 Dec 2017 12:09:19 -0800, John Fastabend wrote:
>> RCU grace period is needed for lockless qdiscs added in the commit
>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>>
>> It is needed now tha
On 12/20/2017 03:23 PM, Cong Wang wrote:
> On Wed, Dec 20, 2017 at 3:05 PM, John Fastabend
> <john.fastab...@gmail.com> wrote:
>> On 12/20/2017 02:41 PM, Cong Wang wrote:
>>> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
>>> <john.fastab...@gmail.com&
On 12/20/2017 02:41 PM, Cong Wang wrote:
> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
> <john.fastab...@gmail.com> wrote:
>> RCU grace period is needed for lockless qdiscs added in the commit
>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>
On 12/20/2017 12:17 PM, Jakub Kicinski wrote:
> On Wed, 20 Dec 2017 10:04:17 -0800, John Fastabend wrote:
>> On 12/19/2017 10:34 PM, Jakub Kicinski wrote:
>>> On Tue, 19 Dec 2017 22:22:27 -0800, Jakub Kicinski wrote:
>>>>>> I get this:
>>&g
d along with the qdisc itself in:
>> qdisc_destroy->qdisc_free
>>
>> Before miniq, tp was checked in the rcu reader path. In case it was not
>> null, q was processed. In slow patch, tp is freed after rcu grace period:
>> tcf_proto_destroy->kfree_rcu
&g
gt; tcf_proto_destroy->kfree_rcu
>
> I assumed that since q is processed in rcu reader, it is also freed after
> a grace period, but now looking at the code I don't see it happening
> like that.
>
> So I think that change to miniq made the existing race window
> a bit wider and easier to hit.
>
> I believe that calling kfree_rcu by call_rcu should resolve this.
>
Hi,
Just sent a patch to complete qdisc_destroy from rcu callback. This
is needed to resolve a race with the lockless qdisc patches.
But I guess it should fix the miniq issue as well?
Thanks,
John
U callback. Otherwise we risk the datapath
adding skbs during removal.
Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/net/sch_generic.h |1 +
net/sched/sch_generic.c | 50
mmit 752fbcc33405d6f8249465e4b2c4e420091bb825
Author: Cong Wang <xiyou.wangc...@gmail.com>
Date: Tue Sep 19 13:15:42 2017 -0700
net_sched: no need to free qdisc in RCU callback
gen estimator has been rewritten in commit 1c0d32fde5bd
("net_sched: gen_estimator: complete rewrite of rate estimators"),
the caller no longer needs to wait for a grace period. So this
patch gets rid of it.
Cc: Jamal Hadi Salim <j...@mojatatu.com>
Cc: Eric Dumazet <eduma...@google.com>
Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>
Acked-by: Eric Dumazet <eduma...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Thanks,
John
g", CMDL_STR, _str, },
> + };
> + int changed;
> + int fecmode;
> + int rv;
> +
> + parse_generic_cmdline(ctx, , cmdline_fec,
> + ARRAY_SIZE(cmdline_fec));
> +
> + if (!fecmode_str)
> + exit_bad_args();
> +
> + fecmode = fecmode_str_to_type(fecmode_str);
> + if (!fecmode)
> + exit_bad_args();
> +
> + feccmd.cmd = ETHTOOL_SFECPARAM;
> + feccmd.fec = fecmode;
> + rv = send_ioctl(ctx, );
> + if (rv != 0) {
> + perror("Cannot set FEC settings");
> + return rv;
> + }
> +
> + return 0;
> +}
> +
> #ifndef TEST_ETHTOOL
> int send_ioctl(struct cmd_context *ctx, void *cmd)
> {
> @@ -5000,6 +5116,9 @@ static const struct option {
> " [ ap-shared ]\n"
> " [ dedicated ]\n"
> " [ all ]\n"},
> + { "--show-fec", 1, do_gfec, "Show FEC settings"},
> + { "--set-fec", 1, do_sfec, "Set FEC settings",
> + " [ encoding auto|off|rs|baser ]\n"},
> { "-h|--help", 0, show_usage, "Show this help" },
> { "--version", 0, do_version, "Show version number" },
> {}
> --
> 2.15.1
>
>
--
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com might be all we have. Be ready.
On 12/18/2017 08:31 PM, Cong Wang wrote:
> On Mon, Dec 18, 2017 at 7:58 PM, John Fastabend
> <john.fastab...@gmail.com> wrote:
>> On 12/18/2017 06:20 PM, Cong Wang wrote:
>>> On Mon, Dec 18, 2017 at 5:25 PM, John Fastabend
>>> <john.fastab...@gmail.com>
On 12/18/2017 06:20 PM, Cong Wang wrote:
> On Mon, Dec 18, 2017 at 5:25 PM, John Fastabend
> <john.fastab...@gmail.com> wrote:
>> On 12/18/2017 02:34 PM, Cong Wang wrote:
>>> First, the check of >ring.queue against NULL is wrong, it
>>> is always false
ast use skb_array")
> Reported-by: syzbot <syzkal...@googlegroups.com>
> Cc: John Fastabend <john.fastab...@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>
> ---
> net/sched/sch_generic.c | 8 +++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
sfec below:
> +static int do_sfec(struct cmd_context *ctx)
> +{
> + char *fecmode_str = NULL;
> + struct ethtool_fecparam feccmd;
> + struct cmdline_info cmdline_fec[] = {
> + { "encoding", CMDL_STR, _str, },
> + };
> + int changed;
> + int fecmode;
> + int rv;
> +
> + parse_generic_cmdline(ctx, , cmdline_fec,
> + ARRAY_SIZE(cmdline_fec));
Thanks,
John
--
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com might be all we have. Be ready.
On Fri, Dec 15, 2017 at 09:56:51AM +0800, Zhang Kang wrote:
> Use MFLCN for 82599 and X540 HW instead of FCTRL.
>
> Signed-off-by: Zhang Kang <tjbroadr...@163.com>
> Signed-off-by: Gao Wayne <wayne@emc.com>
> Signed-off-by: Wei Net <gw_...@163.com>
Thank
;
> This definitely should _not_ be a side effect of enabling XDP on a device.
>
Agreed, CC Emil and Alex we should restore these settings after the
reconfiguration done to support a queue per core.
.John
I has lived too long.
John
On Mon, Dec 11, 2017 at 02:53:11PM +0100, Michal Kubecek wrote:
> This is still work in progress and only a very small part of the ioctl
> interface is reimplemented but I would like to get some comments before
> the patchset becomes too big and changi
p,dedicated,all
>
> Add -shared onto end of components to specified shared version.
>
> Alternatively, you can specific component bitfield directly using
> ethtool --reset DEVNAME flags %x
>
> Signed-off-by: Scott Branden <scott.bran...@broadcom.com>
--
John W. Linvill
an...@broadcom.com>
> Reviewed-by: Andrew Lunn <and...@lunn.ch>
> Signed-off-by: David S. Miller <da...@davemloft.net>
>
> Signed-off-by: Scott Branden <scott.bran...@broadcom.com>
--
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com might be all we have. Be ready.
57846
> Signed-off-by: Scott Branden <scott.bran...@broadcom.com>
--
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com might be all we have. Be ready.
l mapping? Preferrably
your legal name, but definitely something consistent and identifiable.
John
--
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com might be all we have. Be ready.
On 12/12/2017 09:57 AM, Paweł Staszewski wrote:
>
>
> W dniu 2017-12-11 o 23:27, Paweł Staszewski pisze:
>>
>>
>> W dniu 2017-12-11 o 23:15, John Fastabend pisze:
>>> On 12/11/2017 01:48 PM, Paweł Staszewski wrote:
>>>>
>>>>
sage graph:
> https://ibb.co/hU97kG
>
> And there is rising slab_unrecl - Amount of unreclaimable memory used
> for slab kernel allocations
>
>
> Forgot to add that im using hfsc and qdiscs like pfifo on classes.
>
>
Maybe some error case I missed in the qdisc patches I'm looking into
it.
Thanks,
John
graft operation occurs.
This also removes the logic used to pick the next band to dequeue from
and instead just checks a per priority array for packets from top priority
to lowest. This might need to be a bit more clever but seems to work
for now.
Signed-off-by: John Fastabend <john.fas
To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
net/sched/sch_mq.c | 35 +++-
net/sched/sch_mqprio.c | 69 +---
2 f
This adds a peek routine to skb_array.h for use with qdisc.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/linux/skb_array.h |5 +
1 file changed, 5 insertions(+)
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index 8621ffd..c7addf3
nd letting the
qdisc_destroy operation clean up the qdisc correctly.
Note, a refcnt greater than 1 would cause the destroy operation to
be aborted however if this ever happened the reference to the qdisc
would be lost and we would have a memory leak.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
this case add a check when calculating
stats and aggregate the per cpu stats if needed.
Also exports __gnet_stats_copy_queue() to use as a helper function.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/net/gen_stats.h |3 +++
net/core/gen_stats.c|9
-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/net/sch_generic.h | 20
net/sched/sch_api.c |3 ++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4717c4b..2fbae2c9
I can not think of any reason to pull the bad txq skb off the qdisc if
the txq we plan to send this on is still frozen. So check for frozen
queue first and abort before dequeuing either skb_bad_txq skb or
normal qdisc dequeue() skb.
Signed-off-by: John Fastabend <john.fastab...@gmail.
Similar to how gso is handled use skb list for skb_bad_tx this is
required with lockless qdiscs because we may have multiple cores
attempting to push skbs into skb_bad_tx concurrently
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/net/sch_generic.h |2 -
net
at once its possible to have multiple
sk_buffs here so we turn gso_skb into a queue.
This should be the edge case and if we see this frequently then
the netdev/qdisc layer needs to back off.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/net/sch_generic.h
ate the qdisc object so we don't have
dangling allocations after qdisc init.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/net/sch_generic.h |1 +
net/sched/sch_generic.c | 16
2 files changed, 17 insertions(+)
diff --git a/include/net/sch_ge
it returns true. However in this case all
call sites of sch_direct_xmit will implement a dequeue() and get
a null skb and abort. This trades tracking qlen in the hotpath for
an extra dequeue operation. Overall this seems to be good for
performance.
Signed-off-by: John Fastabend <john.fastab...@gmail.
The per cpu qstats support was added with per cpu bstat support which
is currently used by the ingress qdisc. This patch adds a set of
helpers needed to make other qdiscs that use qstats per cpu as well.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/net/sch_gen
doing the enqueue/dequeue operations when tested with
pktgen.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/net/sch_generic.h |1 +
net/core/dev.c| 26 ++
net/sched/sch_generic.c | 30 --
3
Currently __qdisc_run calls qdisc_run_end() but does not call
qdisc_run_begin(). This makes it hard to track pairs of
qdisc_run_{begin,end} across function calls.
To simplify reading these code paths this patch moves begin/end calls
into qdisc_run().
Signed-off-by: John Fastabend <john.fas
on series
to add lockdep more completely, rather than just in code I
touched.
Comments and feedback welcome.
Thanks,
John
---
John Fastabend (14):
net: sched: cleanup qdisc_run and __qdisc_run semantics
net: sched: allow qdiscs to handle locking
net: sched: remove remaining uses for qdi
ox.com>
> Reviewed-by: Eran Ben Elisha <era...@mellanox.com>
Applied -- sorry for the delay!
John
--
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com might be all we have. Be ready.
lied -- sorry for the delay!
--
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com might be all we have. Be ready.
On Tue, Dec 05, 2017 at 12:25:51PM -0800, Scott Branden wrote:
> Hi Paul,
>
>
> On 17-12-04 01:00 PM, Greenwalt, Paul wrote:
> > John,
> >
> > Can this patch be reverted? As Stephen Hemminger mentioned
> > there is an ABI compatibility issue with this patch:
SO for software crypto. This also merges the IPsec
>> GSO and non-GSO paths to both use validate_xmit_xfrm().
> ...
>
> Code looks generally fine to me. Only thing of note is that this
> adds a new dev_requeue_skb() call site and that might intersect with
> John Fastabend's
On 11/15/2017 09:51 AM, Willem de Bruijn wrote:
> On Wed, Nov 15, 2017 at 10:11 AM, John Fastabend
> <john.fastab...@gmail.com> wrote:
>> On 11/14/2017 04:41 PM, Willem de Bruijn wrote:
>>>> /* use instead of qdisc->dequeue() for all qdiscs queried with ->
On 11/14/2017 05:16 PM, Willem de Bruijn wrote:
> On Mon, Nov 13, 2017 at 3:10 PM, John Fastabend
> <john.fastab...@gmail.com> wrote:
>> Add qdisc qlen helper routines for lockless qdiscs to use.
>>
>> The qdisc qlen is no longer used in the hotpath but it
On 11/14/2017 04:41 PM, Willem de Bruijn wrote:
>> /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
>> static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
>> {
>> - struct sk_buff *skb = sch->gso_skb;
>> + struct sk_buff *skb =
On 11/14/2017 05:56 PM, Willem de Bruijn wrote:
> On Tue, Nov 14, 2017 at 7:11 PM, Willem de Bruijn
> <willemdebruijn.ker...@gmail.com> wrote:
>> On Mon, Nov 13, 2017 at 3:08 PM, John Fastabend
>> <john.fastab...@gmail.com> wrote:
>>> sch_direct_xmit() us
allocated state. For uninitialized bands, that calls spin_lock on an
> uninitialized spinlock from skb_array_cleanup -> ptr_ring_cleanup ->
> ptr_ring_consume.
Nice catch will fix in next version. And also make above suggested
changes.
Thanks,
John.
Limit the scope of the first patchset to Rx only, and introduce Tx
>>> in a separate patchset.
>>
>>
>> all sounds good to me except above bit.
>> I don't remember people suggesting to split it this way.
>> What's the value of it without tx?
>>
>
>
d then be consolidating traffic from multiple per-cpu
>> queues onto one drain queue.
>
> We're essentially trying to spread the complexity from enqueue to
> different stages such as enqueue/aggregation and rate
> limiting/dequeue. Each stage will have different parallelisms. It
> should work with multi-queue device since txq selection can be the
> same as today. However our concern is that between enqueue and
> aggregation we have a small window which can allow packet oob, which
> is a sacrifice to better concurrency.
>
So OOO will happen when application cpu migrates presumably? This is
normally prevented with skb ooo flag but it looks like you plan to
violate this somehow. I think a design using ptr_rings/skb_arrays
with bulk dequeue and a good concurrent token bucket ring would
suffice and also not introduce OOO packets.
But don't completely understand your design so might be missing
something.
>>
>> Structure wise this ends up looking not too different from mqprio, the
>> main difference though would be that this would be a classful qdisc
>> and that the virtual qdiscs we have for the traffic classes would need
>> to be replaced with actual qdiscs for handling the "drain" aspect of
>> this.
>
> Structure wise it's similar to mqprio + rate limiting qdisc without
> root lock, and replacing txq/flow level parallelism by cpu level
> parallelism. I'm actually not sure about the similarity with busy
> polling that Eric mentioned since I haven't read the slides yet.
I pushed lockless qdisc patches again today and will repost when
net-next opens. These plus a lockless version of tbf might be
what you need. At one point I had a lockless tbf I can probably
dig that up as well if its useful.
https://www.mail-archive.com/netdev@vger.kernel.org/msg200244.html
Thanks,
John
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 683f6ec..8ab7933 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -206,33 +206,22 @@ static struct sk_buff *dequeue_skb(
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index c7addf3..d0a240e 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -142,6 +142,11 @@ static inli
This adds a peek routine to skb_array.h for use with qdisc.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index 8621ffd..c7addf3 100644
--- a/include/linux/skb_array.h
+++ b/include
clever but seems to work
for now.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 84c4ea1..683f6ec 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -26,6 +26,7 @@
#i
To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index b85885a9..24474d0 100644
--- a/net/sched/sch_mqprio.c
this case add a check when calculating
stats and aggregate the per cpu stats if needed.
Also exports __gnet_stats_copy_queue() to use as a helper function.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/include/net/gen_stats.h b/include/net/g
Reporting qlen when qlen is per cpu requires aggregating the per
cpu counters. This adds a helper routine for this.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index bad24a9..5824509
Add qdisc qlen helper routines for lockless qdiscs to use.
The qdisc qlen is no longer used in the hotpath but it is reported
via stats query on the qdisc so it still needs to be tracked. This
adds the per cpu operations needed.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
nd letting the
qdisc_destroy operation clean up the qdisc correctly.
Note, a refcnt greater than 1 would cause the destroy operation to
be aborted however if this ever happened the reference to the qdisc
would be lost and we would have a memory leak.
Signed-off-by: John Fastabend <john.fastab...@gmail.com&
I can not think of any reason to pull the bad txq skb off the qdisc if
the txq we plan to send this on is still frozen. So check for frozen
queue first and abort before dequeuing either skb_bad_txq skb or
normal qdisc dequeue() skb.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
Similar to how gso is handled use skb list for skb_bad_tx this is
required with lockless qdiscs because we may have multiple cores
attempting to push skbs into skb_bad_tx concurrently
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/inclu
The per cpu qstats support was added with per cpu bstat support which
is currently used by the ingress qdisc. This patch adds a set of
helpers needed to make other qdiscs that use qstats per cpu as well.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff
ate the qdisc object so we don't have
dangling allocations after qdisc init.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7c4b96b..7bc2826 100644
--- a/include/net/sch_generic.h
+++ b/in
at once its possible to have multiple
sk_buffs here so we turn gso_skb into a queue.
This should be the edge case and if we see this frequently then
the netdev/qdisc layer needs to back off.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/inclu
it returns true. However in this case all
call sites of sch_direct_xmit will implement a dequeue() and get
a null skb and abort. This trades tracking qlen in the hotpath for
an extra dequeue operation. Overall this seems to be good for
performance.
Signed-off-by: John Fastabend <john.fastab...@gmail.
doing the enqueue/dequeue operations when tested with
pktgen.
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 65d0d25..bb806a0 100644
--- a/include/net/sch_generic.h
+++ b/inclu
s can be pulled here,
https://github.com/cilium/linux/tree/qdisc
Thanks,
John
---
John Fastabend (17):
net: sched: cleanup qdisc_run and __qdisc_run semantics
net: sched: allow qdiscs to handle locking
net: sched: remove remaining uses for qdisc_qlen in xmit path
net: sched:
com>
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
0 files changed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index d1f413f..4eea719 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -113,8 +113,10 @@ int sch_direct_xmit(struct sk_buff
d this as a separate patch series though.
> * Extend the XDP redirect to support explicit allocator/destructor
> functions. Right now, XDP redirect assumes that the page allocator
> was used, and the XDP redirect cleanup path is decreasing the page
> count of the XDP buffer. Thi
,
> struct xdp_pkt *xdp_pkt;
>
> xdp_pkt = convert_to_xdp_pkt(xdp);
> - if (!xdp_pkt)
> + if (unlikely(!xdp_pkt))
> return -EOVERFLOW;
>
> /* Info needed when constructing SKB on remote CPU */
>
Seems OK to me, just curious is this noticeable at pps benchmarks?
Acked-by: John Fastabend <john.fastab...@gmail.com>
REDIRECT is no longer externally visible.
>>
>> Patchs primary change is to do a namechange from SK_REDIRECT to
>> __SK_REDIRECT
>>
>> Reported-by: Alexei Starovoitov <a...@kernel.org>
>> Signed-off-by: John Fastabend <john.fastab...@gmail.com>
>
Starovoitov <a...@kernel.org>
Signed-off-by: John Fastabend <john.fastab...@gmail.com>
---
include/uapi/linux/bpf.h |1 -
kernel/bpf/sockmap.c | 16
tools/include/uapi/linux/bpf.h |3 +--
3 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/
601 - 700 of 2993 matches
Mail list logo