Re: aio poll, io_pgetevents and a new in-kernel poll API V4

2018-01-25 Thread Benjamin LaHaise
> added, which atomically saves and restores the signal mask over the > io_pgetevents system call. It it the logical equivalent to pselect and > ppoll for io_pgetevents. That looks useful. I'll have to look at this in detail. -ben commit a299c474b19107122eae846b53f7

Re: [PATCH 2/6] move _body_io_syscall to the generic syscall.h

2018-01-05 Thread Benjamin LaHaise
On Fri, Jan 05, 2018 at 11:25:17AM -0500, Jeff Moyer wrote: > Christoph Hellwig writes: > > > This way it can be used for the fallback 6-argument version on > > all architectures. > > > > Signed-off-by: Christoph Hellwig > > This is a strange way to do things. However, I was never really sold

[PATCH] flower: check unused bits in MPLS fields

2017-05-01 Thread Benjamin LaHaise
Since several of the the netlink attributes used to configure the flower classifier's MPLS TC, BOS and Label fields have additional bits which are unused, check those bits to ensure that they are actually 0 as suggested by Jamal. Signed-off-by: Benjamin LaHaise Cc: David Miller Cc: Jamal

Re: [PATCH net-next 0/2] flower: add MPLS matching support

2017-04-25 Thread Benjamin LaHaise
On Tue, Apr 25, 2017 at 08:47:00AM -0400, Jamal Hadi Salim wrote: > On 17-04-25 07:55 AM, Simon Horman wrote: > [..] > > > > I agree something should be done wrt BOS. If the LABEL and TC are to > > be left as-is then I think a similar treatment of BOS - that is masking it > > - makes sense. > > >

Re: [PATCH net-next 0/2] flower: add MPLS matching support

2017-04-24 Thread Benjamin LaHaise
On Mon, Apr 24, 2017 at 08:58:18PM -0400, Jamal Hadi Salim wrote: > On 17-04-24 02:32 PM, David Miller wrote: > > From: Benjamin LaHaise > > > > > Series applied, but in the future: > > > > 1) Put the "v2", "v3", whatever in the init

[PATCH net-next 2/2] cls_flower: add support for matching MPLS fields (v2)

2017-04-22 Thread Benjamin LaHaise
Add support to the tc flower classifier to match based on fields in MPLS labels (TTL, Bottom of Stack, TC field, Label). Signed-off-by: Benjamin LaHaise Signed-off-by: Benjamin LaHaise Reviewed-by: Jakub Kicinski Cc: "David S. Miller" Cc: Simon Horman Cc: Jamal Hadi Salim Cc: Con

[PATCH net-next 0/2] flower: add MPLS matching support

2017-04-22 Thread Benjamin LaHaise
From: Benjamin LaHaise This patch series adds support for parsing MPLS flows in the flow dissector and the flower classifier. Each of the MPLS TTL, BOS, TC and Label fields can be used for matching. v2: incorporate style feedback, move #defines to linux/include/mpls.h Note: this omits Jiri&#

[PATCH net-next 1/2] flow_dissector: add mpls support (v2)

2017-04-22 Thread Benjamin LaHaise
Add support for parsing MPLS flows to the flow dissector in preparation for adding MPLS match support to cls_flower. Signed-off-by: Benjamin LaHaise Signed-off-by: Benjamin LaHaise Reviewed-by: Jakub Kicinski Cc: "David S. Miller" Cc: Simon Horman Cc: Jamal Hadi Salim Cc: Con

Re: [PATCH net-next 2/2] cls_flower: add support for matching MPLS labels

2017-03-27 Thread Benjamin LaHaise
/get_maintainer.pl to get list of ccs for the patches > you submit. Oops. Adding Jamal to the Cc -- please holler if you want me to resend. -ben > > > > >Signed-off-by: Benjamin LaHaise > >Signed-off-by: Benjamin LaHaise > >Reviewed-by: Simon Horman

[RFC PATCH iproute2 net-next] tc flower: support for matching MPLS

2017-03-27 Thread Benjamin LaHaise
-off-by: Benjamin LaHaise Signed-off-by: Benjamin LaHaise Reviewed-by: Simon Horman Reviewed-by: Jakub Kicinski diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h index 7a69f2a..f1129e3 100644 --- a/include/linux/pkt_cls.h +++ b/include/linux/pkt_cls.h @@ -432,6 +432,11 @@ enum

[PATCH net-next 2/2] cls_flower: add support for matching MPLS labels

2017-03-27 Thread Benjamin LaHaise
Add support to tc flower to match based on fields in MPLS labels (TTL, Bottom of Stack, TC field, Label). Signed-off-by: Benjamin LaHaise Signed-off-by: Benjamin LaHaise Reviewed-by: Simon Horman Reviewed-by: Jakub Kicinski diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux

[PATCH net-next 1/2] flow_dissector: add mpls support

2017-03-27 Thread Benjamin LaHaise
Add support for parsing MPLS flows to the flow dissector in preparation for adding MPLS match support to cls_flower. Signed-off-by: Benjamin LaHaise Signed-off-by: Benjamin LaHaise Reviewed-by: Simon Horman Reviewed-by: Jakub Kicinski diff --git a/include/net/flow_dissector.h b/include/net

[PATCH v2 iproute2] f_flower: don't set TCA_FLOWER_KEY_ETH_TYPE for "protocol all"

2017-01-20 Thread Benjamin LaHaise
ing a rule. Fix this by omitting TCA_FLOWER_KEY_ETH_TYPE if the protocol is set to ETH_P_ALL. Fixes: 488b41d020fb ("tc: flower no need to specify the ethertype") Cc: Jamal Hadi Salim Signed-off-by: Benjamin LaHaise Signed-off-by: Benjamin LaHaise diff --git a/tc/f_flower.c b

[PATCH iproute2] f_flower: don't set TCA_FLOWER_KEY_ETH_TYPE for "protocol all"

2017-01-19 Thread Benjamin LaHaise
is set to ETH_P_ALL. Signed-off-by: Benjamin LaHaise Signed-off-by: Benjamin LaHaise diff --git a/tc/f_flower.c b/tc/f_flower.c index 1dbc532..1f90da3 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -527,11 +527,13 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,

Re: use-after-free in sock_wake_async

2015-11-24 Thread Benjamin LaHaise
On Tue, Nov 24, 2015 at 04:30:01PM -0500, Jason Baron wrote: > So looking at this trace I think its the other->sk_socket that gets > freed and then we call sk_wake_async() on it. > > We could I think grab the socket reference there with unix_state_lock(), > since that is held by unix_release_sock(

Re: [PATCH] fib_semantics: prevent long hash chains in access server config

2008-01-13 Thread Benjamin LaHaise
On Sat, Jan 12, 2008 at 09:38:57PM -0800, David Miller wrote: > And guess why we don't do this? Because it's not part of > the key. Other aspects of the base fib_info and nexthops > provide the uniqueness, not the devindex of the first hop. > > So you'll need to find another way to do this. Ah,

[PATCH] fib_semantics: prevent long hash chains in access server config

2008-01-12 Thread Benjamin LaHaise
This is a patch from a while ago that I'm resending. Basically, in access server configurations, a lot of routes have the same local ip address but on different devices. This fixes the long chains that result from not including the device index in the hash. -ben diff --git a/

announce: Babylon PPP (scalable L2TP) 2.0 beta 1

2008-01-01 Thread Benjamin LaHaise
Hello all, This is an announcement for a new beta release of a decently scalable L2TP stack for Linux. It is based off of the Babylon PPP stack created by SpellCaster back in the '98-'00 timeframe. Right now there are lots of rough edges (especially in terms of documentation), but it works we

[PATCH] don't allow netfilter --setmss to increase mss

2007-12-04 Thread Benjamin LaHaise
, but it's more work than this is. Thoughts? Would it be better to add a new flag? -ben Signed-off-by: Benjamin LaHaise <[EMAIL PROTECTED]> diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c index d40f7e4..411c482 100644 --- a/net/netfilter/xt_TC

Re: [patch 03/18] drivers/net/ns83820.c: add paramter to disable autonegotiation

2007-08-10 Thread Benjamin LaHaise
On Fri, Aug 10, 2007 at 02:05:13PM -0700, [EMAIL PROTECTED] wrote: > Also added a "disable_autoneg" module argument to completely disable > autoneg on all cards using this driver. ... > [akpm: this is a previously-nacked patch, but the problem is real] Please remove this part of the patch. The et

Re: FSCKED clock sources WAS(Re: [WIP][PATCHES] Network xmit batching

2007-06-25 Thread Benjamin LaHaise
On Mon, Jun 25, 2007 at 12:59:54PM -0400, jamal wrote: > On Thu, 2007-21-06 at 12:55 -0400, Benjamin LaHaise wrote: > > > You should qualify that as 'Old P4 Xeon', as the Core 2 Xeons are leagues > > better. > > The Xeon hardware is not that old - about a y

Re: FSCKED clock sources WAS(Re: [WIP][PATCHES] Network xmit batching

2007-06-21 Thread Benjamin LaHaise
On Thu, Jun 21, 2007 at 12:08:19PM -0400, jamal wrote: > The results in the table for opteron and xeon are swapped when > cutnpasting from a larger test result. So Opteron is the one with better > results. > In any case - off for the day over here. You should qualify that as 'Old P4 Xeon', as the

Re: r8169 tx problem (1s pause with ping)

2007-06-18 Thread Benjamin LaHaise
On Fri, Jun 15, 2007 at 01:33:14AM +1000, David Gundersen wrote: > In the mean-time I'll attach my patch for the r8168-8.001.00 realtek > driver here in case anybody else wants to have a play with it and see if > it helps them out. Out of curiousity, does it work if you just do a single read (ie

r8169 tx problem (1s pause with ping)

2007-06-12 Thread Benjamin LaHaise
Hello folks, I'm seeing something odd with r8169 on FC7: doing a ping -s 1600 alternates between a 1s latency and sub 1ms. Has anyone else seen anything like this? The system in question is an Asus M2A-VM with an onboard RTL8111 (I think). NAPI doesn't seem to make a difference. The kernel

Re: [PATCH] TCP FIN gets dropped prematurely, results in ack storm

2007-05-01 Thread Benjamin LaHaise
On Tue, May 01, 2007 at 02:03:04PM -0400, John Heffner wrote: > Actually, you cannot get in this situation by loss or reordering of > packets, only be corruption of state on one side. It sends the FIN, > which effectively increases the sequence number by one. However, all > later segments it s

Re: [PATCH] TCP FIN gets dropped prematurely, results in ack storm

2007-05-01 Thread Benjamin LaHaise
On Tue, May 01, 2007 at 01:54:03PM -0400, John Heffner wrote: > Looking at your trace, it seems like the behavior of the test system > 192.168.2.2 is broken in two ways. First, like you said it has broken > state in that it has forgotten that it sent the FIN. Once you do that, > the connection

Re: [PATCH] TCP FIN gets dropped prematurely, results in ack storm

2007-05-01 Thread Benjamin LaHaise
On Tue, May 01, 2007 at 09:41:28PM +0400, Evgeniy Polyakov wrote: > Hmm, 2.2 machine in your test seems to behave incorrectly: I am aware of that. However, I think that the loss of certain packets and reordering can result in the same behaviour. What's more, is that this behaviour can occur in

Re: [PATCH] TCP FIN gets dropped prematurely, results in ack storm

2007-05-01 Thread Benjamin LaHaise
On Tue, May 01, 2007 at 08:20:50PM +0400, Evgeniy Polyakov wrote: > > http://www.kvack.org/~bcrl/ack-storm.log . As near as I can tell, a > > similar effect can occur between two Linux boxes if the right packets get > > reordered/dropped during connection teardown. > > Could you archive 24Mb fi

[PATCH] TCP FIN gets dropped prematurely, results in ack storm

2007-05-01 Thread Benjamin LaHaise
Hello, While testing a failover scenario, I managed to trigger an ack storm between a Linux box and another system. Although the cause of this particular ACK storm was due to the other box forgetting that it sent out a FIN (the second node was unaware of the FIN the first sent in its dying gas

[PATCH] fib_info_hashfn leads to long hash chains

2007-04-29 Thread Benjamin LaHaise
filedsfile_find 6366 0.3248 babylond babylond memset Signed-off-by: Benjamin LaHaise <[EMAIL PROTECTED]> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 3dad12e..e790842 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_

Re: TCP connection stops after high load.

2007-04-11 Thread Benjamin LaHaise
On Wed, Apr 11, 2007 at 02:06:31PM -0700, Ben Greear wrote: > For the dup acks, I see nothing *but* dup acks on the wire...going in > both directions interestingly, at greater than 100,000 packets per second. > > I don't mind adding printks...and I've started reading through the code, > but there

Re: [patch 1/4] network dev read_mostly

2007-03-15 Thread Benjamin LaHaise
On Thu, Mar 15, 2007 at 12:25:16AM -0700, David Miller wrote: > Could we obtain %rip relative addressing with the ELF > relocation approach I mentioned? I think we can for some of the objects -- things like slab caches are good candidates if we have the initialization done at init time, which wo

Re: [patch 1/4] network dev read_mostly

2007-03-14 Thread Benjamin LaHaise
On Mon, Mar 12, 2007 at 02:08:18PM -0700, Stephen Hemminger wrote: > For Eric, mark packet type and network device watermarks > as read mostly. The following x86-64 bits might be intersting, as they allow you to completely eliminate the memory access for run time defined constants. Note that re

Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD

2007-03-13 Thread Benjamin LaHaise
On Tue, Mar 13, 2007 at 01:39:12AM -0700, Roland McGrath wrote: > The OPEN_MAX constant is an arbitrary number with no useful relation to > anything. Nothing should be using it. This patch changes SCM_MAX_FD to > use NR_OPEN instead of OPEN_MAX. This increases the size of the struct > scm_fp_lis

Re: all syscalls initially taking 4usec on a P4? Re: nonblocking UDPv4 recvfrom() taking 4usec @ 3GHz?

2007-02-20 Thread Benjamin LaHaise
On Tue, Feb 20, 2007 at 08:33:20PM +0100, bert hubert wrote: > I'm investigating this further for other system calls. It might be that my > measurements are off, but it appears even a slight delay between calls > incurs a large penalty. Make sure your system is idle. Userspace bloat means that *l

Re: Extensible hashing and RCU

2007-02-19 Thread Benjamin LaHaise
On Mon, Feb 19, 2007 at 01:26:42PM -0500, Benjamin LaHaise wrote: > On Mon, Feb 19, 2007 at 07:13:07PM +0100, Eric Dumazet wrote: > > So even with a lazy hash function, 89 % of lookups are satisfied with less > > than 6 compares. > > Which sucks, as those are typically goi

Re: Extensible hashing and RCU

2007-02-19 Thread Benjamin LaHaise
On Mon, Feb 19, 2007 at 07:13:07PM +0100, Eric Dumazet wrote: > So even with a lazy hash function, 89 % of lookups are satisfied with less > than 6 compares. Which sucks, as those are typically going to be cache misses (costing many hundreds of cpu cycles). Hash chains fair very poorly under Do

Re: [PATCH -mm 0/10][RFC] aio: make struct kiocb private

2007-01-17 Thread Benjamin LaHaise
On Mon, Jan 15, 2007 at 08:25:15PM -0800, Nate Diller wrote: > the right thing to do from a design perspective. Hopefully it enables > a new architecture that can reduce context switches in I/O completion, > and reduce overhead. That's the real motive ;) And it's a broken motive. Context switch

Re: SKB BUG: Invalid truesize, current git

2006-11-08 Thread Benjamin LaHaise
On Tue, Nov 07, 2006 at 02:57:24PM -0800, David Miller wrote: > > Since pskb_copy tacks on the non-linear bits from the original > > skb, it needs to count them in the truesize field of the new skb. > > > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> > > Applied, thanks Herbert. This seems to

SKB BUG: Invalid truesize, current git

2006-11-06 Thread Benjamin LaHaise
Hi all, I managed to get a backtrace for the Invalid truesize bug. The trigger is running LMbench2, but it's rater intermittent. Traffic should be going over the loopback interface, but the main nic on the machine is e1000. Let me know if anyone has any ideas for things to try.

Re: [PATCH?] tcp and delayed acks

2006-08-16 Thread Benjamin LaHaise
On Wed, Aug 16, 2006 at 12:11:12PM -0700, Stephen Hemminger wrote: > > is throttled waiting for ACKs to arrive. The problem is exacerbated when > > the sender is using a small send buffer -- running netperf -C -c -- -s 1024 > > show a miserable 420Kbit/s at essentially 0% CPU usage. Tests over

[PATCH?] tcp and delayed acks

2006-08-16 Thread Benjamin LaHaise
Hello folks, In looking at a few benchmarks (especially netperf) run locally, it seems that tcp is unable to make full use of available CPU cycles as the sender is throttled waiting for ACKs to arrive. The problem is exacerbated when the sender is using a small send buffer -- running netperf -

Re: [PATCH] [e1000]: Remove unnecessary tx_lock

2006-08-08 Thread Benjamin LaHaise
On Wed, Aug 09, 2006 at 10:25:30AM +1000, Herbert Xu wrote: > The problem here is that the TX clean function does not take the lock > (nor do we want it to). It can thus come in while you're transmitting > and empty the queue. That can be solved with sequence numbers -- ie, we keep track of the n

Re: [PATCH] [e1000]: Remove unnecessary tx_lock

2006-08-08 Thread Benjamin LaHaise
On Tue, Aug 08, 2006 at 03:06:07PM -0700, David Miller wrote: > The driver ->hard_start_xmit() method is invoked with the queue > unlocked, so this kind of scheme would not be workable. Looking at e1000, NETIF_F_LLTX is a waste -- the driver takes a spinlock almost immediately after entering. Ta

Re: [PATCH] [e1000]: Remove unnecessary tx_lock

2006-08-08 Thread Benjamin LaHaise
On Fri, Aug 04, 2006 at 04:31:11PM -0700, David Miller wrote: > Yes, it's meant to catch unintented races. > > The queueing layer that calls ->hard_start_xmit() technically has no > need to support NETDEV_TX_BUSY as a return value, since the device > is able to prevent this. > > If we could avoid

Re: problems with e1000 and jumboframes

2006-08-03 Thread Benjamin LaHaise
On Thu, Aug 03, 2006 at 04:49:15PM +0200, Krzysztof Oledzki wrote: > With 1 GB of RAM full 1GB/3GB (CONFIG_VMSPLIT_3G_OPT) seems to be > enough... Nope, you lose ~128MB of RAM for vmalloc space. -ben -- "Time is of no importance, Mr. President, only life is important." Don't Ema

Re: problems with e1000 and jumboframes

2006-08-03 Thread Benjamin LaHaise
On Thu, Aug 03, 2006 at 03:48:39PM +0200, Arnd Hannemann wrote: > However the box is a VIA Epia MII12000 with 1 GB of Ram and 1 GB of swap > enabled, so there should be plenty of memory available. HIGHMEM support > is off. The e1000 nic seems to be an 82540EM, which to my knowledge > should support

Re: [RFC 1/4] kevent: core files.

2006-07-27 Thread Benjamin LaHaise
On Thu, Jul 27, 2006 at 02:44:50PM -0700, Zach Brown wrote: > > >>int kevent_getevents(int event_fd, struct ukevent *events, > >>int min_events, int max_events, > >>struct timeval *timeout); > > > > You've just reinvented io_getevents(). > > Well, that's certainly one

Re: [RFC 1/4] kevent: core files.

2006-07-27 Thread Benjamin LaHaise
On Thu, Jul 27, 2006 at 12:18:42PM -0700, Zach Brown wrote: > The easy part is fixing up the somewhat obfuscated collection call. > Instead of coming in through a multiplexer that magically treats a void > * as a struct kevent_user_control followed by N ukevents (as specified > in the kevent_user_c

Re: [patch 2/3] drivers/net/ns83820.c: add paramter to disable autonegotiation

2006-06-26 Thread Benjamin LaHaise
On Sun, Jun 25, 2006 at 01:44:36AM -0700, [EMAIL PROTECTED] wrote: > > From: Dan Faerch <[EMAIL PROTECTED]> > > Adds "ethtool command" support to driver. Initially 2 commands are > implemented: force fullduplex and toggle autoneg. This part is good, although doing something for copper cards nee

Re: [1/4] kevent: core files.

2006-06-23 Thread Benjamin LaHaise
On Fri, Jun 23, 2006 at 01:54:23PM -0700, David Miller wrote: > From: Benjamin LaHaise <[EMAIL PROTECTED]> > Date: Fri, 23 Jun 2006 16:31:14 -0400 > > > Eh? Nobody has posted any numbers comparing the approaches yet, so this > > is pure handwaving, unless you

Re: [1/4] kevent: core files.

2006-06-23 Thread Benjamin LaHaise
On Sat, Jun 24, 2006 at 01:08:27AM +0400, Evgeniy Polyakov wrote: > On Fri, Jun 23, 2006 at 04:44:42PM -0400, Benjamin LaHaise ([EMAIL > PROTECTED]) wrote: > > > AIO completion approach was designed to be used with process context VFS > > > update. read/write approach can

Re: [1/4] kevent: core files.

2006-06-23 Thread Benjamin LaHaise
On Sat, Jun 24, 2006 at 12:17:17AM +0400, Evgeniy Polyakov wrote: > But now it is implemented as repeated call for the same work, which does > not look like it can be used for any other types of work. Given an iocb, you do not have to return -EIOCBRETRY, instead return -EIOCBQUEUED and then from

Re: [1/4] kevent: core files.

2006-06-23 Thread Benjamin LaHaise
On Fri, Jun 23, 2006 at 01:19:40PM -0700, David Miller wrote: > I completely agree with Evgeniy here. > > There is nothing in the kernel today that provides integrated event > handling. Nothing. So when someone says to use the "existing" stuff, > they need to have their head examined. The exist

Re: [1/4] kevent: core files.

2006-06-23 Thread Benjamin LaHaise
On Fri, Jun 23, 2006 at 11:24:29PM +0400, Evgeniy Polyakov wrote: > What API are you talking about? > There is only epoll(), which is 40% slower than kevent, and AIO, which > works not as state machine, but as repeated call for the same work. > There is also inotify, which allocates new message eac

Re: [1/4] kevent: core files.

2006-06-23 Thread Benjamin LaHaise
On Fri, Jun 23, 2006 at 11:09:34AM +0400, Evgeniy Polyakov wrote: > This patch includes core kevent files: > - userspace controlling > - kernelspace interfaces > - initialisation > - notification state machines We don't need yet another event mechanism in the kernel, so I don't see why the ne

Re: [patch 7/8] lock validator: fix ns83820.c irq-flags bug

2006-06-11 Thread Benjamin LaHaise
> The above code snippet removes the nested unlock-irq, but now the code > is unbalanced, so IMO this patch _adds_ confusion. > > I think the conservative patch for 2.6.17 is the one I have attached. > Unless there are objections, that is what I will forward... This looks reasonable and suffici

Re: TSO and IPoIB performance degradation

2006-03-20 Thread Benjamin LaHaise
On Mon, Mar 20, 2006 at 02:04:07PM +0200, Michael S. Tsirkin wrote: > does not stretch ACKs anymore. RFC 2581 does mention that it might be OK to > stretch ACKs "after careful consideration", and we are seeing that it helps > IP over InfiniBand, so recent Linux kernels perform worse in that respect

Re: [PATCH] scm: fold __scm_send() into scm_send()

2006-03-13 Thread Benjamin LaHaise
On Mon, Mar 13, 2006 at 09:05:31PM +0100, Ingo Oeser wrote: > From: Ingo Oeser <[EMAIL PROTECTED]> > > Fold __scm_send() into scm_send() and remove that interface completly > from the kernel. Whoa, what are you doing here? Uninlining scm_send() is a Bad Thing to do given that scm_send() is in t

Re: [RFC/PATCH] rcuification of ipv4 established and timewait connections

2006-03-09 Thread Benjamin LaHaise
On Thu, Mar 09, 2006 at 01:12:20PM -0800, David S. Miller wrote: > Once we have RCU in place for the TCP hash tables, we have the chance > to code up dynamically sized hash tables. With the current locking, > this is basically impossible, with RCU it can be done. Nice! > So Ben can you work to f

Re: [RFC/PATCH] rcuification of ipv4 established and timewait connections

2006-03-09 Thread Benjamin LaHaise
On Thu, Mar 09, 2006 at 07:25:25PM +0100, Eric Dumazet wrote: > On a second thought, do you think we still need one rwlock per hash chain ? > > TCP established hash table entries: 1048576 (order: 12, 16777216 bytes) > > On this x86_64 machine, we 'waste' 8 MB of ram for those rwlocks. > > With R

Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh

2006-03-09 Thread Benjamin LaHaise
On Thu, Mar 09, 2006 at 07:41:08PM +1100, Nick Piggin wrote: > Considering that local_t has been broken so that basically nobody > is using it, now is a great time to rethink the types before it > gets fixed and people start using it. I'm starting to get more concerned as the per-cpu changes that

Re: [RFC/PATCH] rcuification of ipv4 established and timewait connections

2006-03-09 Thread Benjamin LaHaise
On Thu, Mar 09, 2006 at 01:18:26PM +0300, Evgeniy Polyakov wrote: > Ok, I hacked quite a bit in the patch, but I think nothing major was > changed, basically patch rejects. > And I'm now unable to bind to 0.0.0.0 address, i.e. bind() does not > fail, but all connections are refused. > Bind to machi

Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh

2006-03-08 Thread Benjamin LaHaise
On Wed, Mar 08, 2006 at 02:25:28PM -0800, Ravikiran G Thirumalai wrote: > Then, for the batched percpu_counters, we could gain by using local_t only > for > the UP case. But we will have to have a new local_long_t implementation > for that. Do you think just one use case of local_long_t warrant

Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh

2006-03-08 Thread Benjamin LaHaise
On Wed, Mar 08, 2006 at 01:07:26PM -0800, Ravikiran G Thirumalai wrote: > But on non x86, local_bh_disable() is gonna be cheaper than a cli/atomic op > no? > (Even if they were switched over to do local_irq_save() and > local_irq_restore() from atomic_t's that is). It's still more expensive than

Re: [patch 1/4] net: percpufy frequently used vars -- add percpu_counter_mod_bh

2006-03-08 Thread Benjamin LaHaise
On Wed, Mar 08, 2006 at 12:26:56PM -0800, Ravikiran G Thirumalai wrote: > +static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long > amount) > +{ > + local_bh_disable(); > + fbc->count += amount; > + local_bh_enable(); > +} Please use local_t instead, then you don't

Re: [RFC/PATCH] rcuification of ipv4 established and timewait connections

2006-03-08 Thread Benjamin LaHaise
On Wed, Mar 08, 2006 at 02:01:04PM +0300, Evgeniy Polyakov wrote: > When I tested RCU for similar change for kevent, but postponed more work > to RCU callback, including socket closing and some attempts to inode > dereferencing, such change forced performance degradation for httperf > benchmark and

Re: [PATCH] x86-64, use page->virtual to get 64 byte struct page

2006-03-08 Thread Benjamin LaHaise
On Wed, Mar 08, 2006 at 10:40:38AM +0100, Andi Kleen wrote: > On Wednesday 08 March 2006 03:38, Benjamin LaHaise wrote: > > > It's hardly that uncommon for pages to cross cachelines or for pages to > > move around CPUs with networking. > > Data? I posted a workloa

Re: [PATCH] x86-64, use page->virtual to get 64 byte struct page

2006-03-07 Thread Benjamin LaHaise
On Tue, Mar 07, 2006 at 07:50:52PM +0100, Andi Kleen wrote: > > My vmlinux has > > 80278382 : > 80278382: 8b 0d 78 ea 41 00 mov4319864(%rip),%ecx ># 80696e00 > 80278388: 48 89 f8mov%rdi,%rax > 8027838b:

Re: [PATCH] x86-64, use page->virtual to get 64 byte struct page

2006-03-07 Thread Benjamin LaHaise
On Tue, Mar 07, 2006 at 05:27:37PM +0100, Andi Kleen wrote: > On Wednesday 08 March 2006 00:26, Benjamin LaHaise wrote: > > Hi Andi, > > > > On x86-64 one inefficiency that shows up on profiles is the handling of > > struct page conversion to/from idx and addres

[PATCH] x86-64, use page->virtual to get 64 byte struct page

2006-03-07 Thread Benjamin LaHaise
10.00 9676.82 90.2590.251.528 1.528 87380 16384 1638410.00 9711.26 90.8090.801.532 1.532 -ben -- "Time is of no importance, Mr. President, only life is important." Don't Email: <[EMAIL PROTECTED]>. Signed-off-by

[PATCH] use wait queue spinlock for the socket spinlock

2006-03-07 Thread Benjamin LaHaise
1.534 87380 16384 1638410.00 9669.24 90.9090.901.540 1.540 87380 16384 1638410.00 9676.82 90.2590.251.528 1.528 87380 16384 1638410.00 9711.26 90.8090.801.532 1.532 -ben Signed-off-by: Benjamin LaHaise

[EXPERIMENTAL] HT aware loopback device (hack, x86-64 only atm)

2006-03-07 Thread Benjamin LaHaise
Hi folks, I'd like to start some discussions on SMP optimizations for the networking stack. The patch below is one such example which changes the loopback device in a way that helps out on workloads like netperf by trying to share more work with the other CPU on an HT system. Basically, if th

Re: [PATCH] avoid atomic op on page free

2006-03-06 Thread Benjamin LaHaise
On Tue, Mar 07, 2006 at 01:04:36PM +1100, Nick Piggin wrote: > I'd say it will turn out to be more trouble than its worth, for the > miserly cost > avoiding one atomic_inc, and one atomic_dec_and_test on page-local data > that will > be in L1 cache. I'd never turn my nose up at anyone just having

Re: [PATCH] avoid atomic op on page free

2006-03-06 Thread Benjamin LaHaise
On Tue, Mar 07, 2006 at 12:53:27PM +1100, Nick Piggin wrote: > You can't do this because you can't test PageLRU like that. > > Have a look in the lkml archives a few months back, where I proposed > a way to do this for __free_pages(). You can't do it for put_page. Even if we know that we are the

Re: [PATCH] avoid atomic op on page free

2006-03-06 Thread Benjamin LaHaise
On Mon, Mar 06, 2006 at 05:39:41PM -0800, Andrew Morton wrote: > > It's just a simple send() and recv() pair of processes. Networking uses > > pages for the buffer on user transmits. > > You mean non-zero-copy transmits? If they were zero-copy then those pages > would still be on the LRU. Corr

Re: [PATCH] avoid atomic op on page free

2006-03-06 Thread Benjamin LaHaise
On Mon, Mar 06, 2006 at 04:50:39PM -0800, Andrew Morton wrote: > Am a bit surprised at those numbers. > Because userspace has to do peculiar things to get its pages taken off the > LRU. What exactly was that application doing? It's just a simple send() and recv() pair of processes. Networking u

Re: [PATCH] avoid memory barrier bitops in hot paths

2006-03-06 Thread Benjamin LaHaise
On Mon, Mar 06, 2006 at 04:25:32PM -0800, David S. Miller wrote: > Wait... > > what about "test_and_clear_bit()"? > > Most implementations should be doing the light-weight test _first_, > and only do the update if the bit isn't in the state desired. > > I think in such cases we can elide the mem

Re: [PATCH] avoid memory barrier bitops in hot paths

2006-03-06 Thread Benjamin LaHaise
On Tue, Mar 07, 2006 at 12:59:17AM +0100, Eric Dumazet wrote: > I'm not even sure this 'optimization' is valid on UP. It can be, as branch prediction makes the test essentially free. The real answer is that it depends on the CPU, how much pressure there is on the write combining buffers and reo

Re: [RFC/PATCH] rcuification of ipv4 established and timewait connections

2006-03-06 Thread Benjamin LaHaise
On Tue, Mar 07, 2006 at 12:48:23AM +0100, Eric Dumazet wrote: > If I understand your patch correctly, your future plan is to change "struct > inet_ehash_bucket" rwlock_t wlock to a pure spinlock (when ipv6 is > converted to rcu lookups too), because no more read_lock() are expected ? Yes/no... T

Re: [PATCH] avoid memory barrier bitops in hot paths

2006-03-06 Thread Benjamin LaHaise
On Mon, Mar 06, 2006 at 08:29:30PM -0300, Arnaldo Carvalho de Melo wrote: > - clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags); > + if (test_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags)) > + clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags); > > Something like fa

[PATCH] avoid memory barrier bitops in hot paths

2006-03-06 Thread Benjamin LaHaise
This patch removes a couple of memory barriers from atomic bitops that showed up on profiles of netperf. -ben Signed-off-by: Benjamin LaHaise <[EMAIL PROTECTED]> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 00aa80e..dadc84c 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4

[RFC/PATCH] rcuification of ipv4 established and timewait connections

2006-03-06 Thread Benjamin LaHaise
importance, Mr. President, only life is important." Don't Email: <[EMAIL PROTECTED]>. Signed-off-by: Benjamin LaHaise <[EMAIL PROTECTED]> diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 25f708f..73b05ab 100644 --- a/include/net/inet

[EMAIL PROTECTED]: [PATCH] use fget_light() in net/socket.c]

2006-03-06 Thread Benjamin LaHaise
05.36 82.5682.561.690 1.690 87380 16384 1638410.00 7979.61 82.5082.501.694 1.694 -- "Time is of no importance, Mr. President, only life is important." Don't Email: <[EMAIL PROTECTED]>. Signed-off-by: Benjamin LaHaise <[EMAIL PROTECTED]

Re: [PATCH 1/8] [I/OAT] DMA memcpy subsystem

2006-03-04 Thread Benjamin LaHaise
On Fri, Mar 03, 2006 at 01:42:20PM -0800, Chris Leech wrote: > +void dma_async_device_unregister(struct dma_device* device) > +{ ... > + kref_put(&device->refcount, dma_async_device_cleanup); > + wait_for_completion(&device->done); > +} This looks like a bug: device is dereferenced after i

Re: Poor performance with MTU 9000

2006-02-28 Thread Benjamin LaHaise
On Tue, Feb 28, 2006 at 09:33:38PM -0500, John Zielinski wrote: > Rick Jones wrote: > >And if you add the test-specific -D option? > > No difference. I have TCP_NODELAY in my Samba config and copying files > from the server is painfully slow. That's what got me started on all > these tests. Y

[PATCH] use fget_light for network syscalls

2006-02-28 Thread Benjamin LaHaise
ubstantial. -ben Signed-off-by: Benjamin LaHaise diff --git a/net/socket.c b/net/socket.c index a00851f..64019f8 100644 --- a/net/socket.c +++ b/net/socket.c @@ -414,6 +414,28 @@ out: return fd; } +static struct socket *sock_from_file(struct file *file, int *err) +{ + struct in

Re: GbE performance

2006-02-28 Thread Benjamin LaHaise
On Tue, Feb 28, 2006 at 09:21:39AM +0100, Dag Bakke wrote: > Got both. Max throughput increased from ~605 Mbps with carefully tuned > options to iperf, to ~650 Mbps with less carefully tuned options to iperf. One other suggestion that might be worth testing: try comparing SMP and UP kernels. I'v

Re: [0/2] Kevent. Network AIO.

2006-02-09 Thread Benjamin LaHaise
On Thu, Feb 09, 2006 at 08:03:26PM +0300, Evgeniy Polyakov wrote: > It is completely different things. > The only common is that they _require_ some kind of notification > mechanism, > but none provide them. > epoll() can not be used for AIO and timers. True, that is a disappointment. There real

Re: [0/2] Kevent. Network AIO.

2006-02-09 Thread Benjamin LaHaise
On Thu, Feb 09, 2006 at 04:56:11PM +0300, Evgeniy Polyakov wrote: > Hello. > > I'm pleased to announce following projects: > > 1/2 - Kevent subsystem. > This subsystem incorporates several AIO/kqueue design notes and ideas. > Kevent can be used both for edge and level notifications. I

Re: [PATCH] af_unix: use shift instead of integer division

2006-02-07 Thread Benjamin LaHaise
On Tue, Feb 07, 2006 at 04:15:31PM +0100, Andi Kleen wrote: > On Tuesday 07 February 2006 15:54, Benjamin LaHaise wrote: > > > + if (size > ((sk->sk_sndbuf >> 1) - 64)) > > + size = (sk->sk_sndbuf >> 1) - 64; > > This is re

[PATCH] af_unix: scm: better initialization

2006-02-07 Thread Benjamin LaHaise
bandwidth. Note that we avoid the issues surrounding potentially uninitialized members of the ucred structure by constructing a struct ucred instead of assigning the members individually, which forces the compiler to zero any padding. Signed-off-by: Benjamin LaHaise <[EMAIL PROTECTED]> diff -

[PATCH] af_unix: use shift instead of integer division

2006-02-07 Thread Benjamin LaHaise
ff-by: Benjamin LaHaise <[EMAIL PROTECTED]> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 1b5989b..b57d4d9 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1427,15 +1427,15 @@ static int unix_stream_sendmsg(struct ki while

Re: [2.6 patch] schedule eepro100.c for removal

2006-02-03 Thread Benjamin LaHaise
Where's the hunk to make the eepro100 driver spew messages about being obsolete out upon loading? -ben On Fri, Feb 03, 2006 at 10:32:34PM +0100, Adrian Bunk wrote: > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > --- > > This patch was already sent on: > - 18 Jan 2006 > >

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-29 Thread Benjamin LaHaise
On Sun, Jan 29, 2006 at 07:54:09AM +0100, Eric Dumazet wrote: > Well, I think that might be doable, maybe RCU magic ? > > 1) local_t are not that nice on all archs. It is for the users that matter, and the hooks are there if someone finds it to be a performance problem. > 2) The consolidation p

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-28 Thread Benjamin LaHaise
On Sat, Jan 28, 2006 at 04:55:49PM -0800, Andrew Morton wrote: > local_t isn't much use until we get rid of asm-generic/local.h. Bloaty, > racy with nested interrupts. The overuse of atomics is horrific in what is being proposed. All the major architectures except powerpc (i386, x86-64, ia64, an

Re: [patch 3/4] net: Percpufy frequently used variables -- proto.sockets_allocated

2006-01-28 Thread Benjamin LaHaise
On Sat, Jan 28, 2006 at 01:28:20AM +0100, Eric Dumazet wrote: > We might use atomic_long_t only (and no spinlocks) > Something like this ? Erk, complex and slow... Try using local_t instead, which is substantially cheaper on the P4 as it doesn't use the lock prefix and act as a memory barrier.

Re: driver skb reuse

2006-01-25 Thread Benjamin LaHaise
On Tue, Jan 24, 2006 at 10:35:06PM +0100, Robert Olsson wrote: > I splitted alloc_skb in two parts to get the memset() part even done > in the driver just before passing the skb to the stack. > > I did expect a big win in but I didn't see any gain. Strange but the code > was bad so it might

Re: driver skb reuse

2006-01-24 Thread Benjamin LaHaise
On Tue, Jan 24, 2006 at 02:23:15PM +0100, Robert Olsson wrote: > etc. In the test below I use my usual lab setup but just let netfilter > drop the packets. We win about 13% in this experiment below. > > Here we process (drop) about 13% packets more when skb'a get reued. Instead of doing a comple

Re: [2.6 patch] schedule SHAPER for removal

2006-01-22 Thread Benjamin LaHaise
On Sat, Jan 21, 2006 at 01:48:48AM +0100, Adrian Bunk wrote: > Do we really have to wait the three years between stable Debian releases > for removing an obsolete driver that has always been marked as > EXPERIMENTAL? > > Please be serious. I am completely serious. The traditional cycle of obso

  1   2   >