Hi, On Monday, December 5, 2011 10:53 CET, "Sebastian Reitenbach" <sebas...@l00-bugdead-prods.de> wrote: > On Sunday, December 4, 2011 21:01 CET, Mark Kettenis > <mark.kette...@xs4all.nl> wrote: > > > > Date: Sun, 4 Dec 2011 15:10:56 +0100 > > > From: Claudio Jeker <cje...@diehard.n-r-g.com> > > > > > > On Sun, Dec 04, 2011 at 01:35:33PM +0100, Sebastian Reitenbach wrote: > > > > On Sunday, December 4, 2011 13:24 CET, Camiel Dobbelaar > > > > <c...@sentia.nl> wrote: > > > > > > > > > On 4-12-2011 13:01, Sebastian Reitenbach wrote: > > > > > > the default maximum size of the tcp send and receive buffer used by > > > > > > the autosizing algorithm is way too small, when trying to get > > > > > > maximum speed with high bandwidth and high latency connections. > > > > > > > > > > I have tweaked SB_MAX on a system too, but it was for UDP. > > > > > > > > > > When running a busy Unbound resolver, the recommendation is too bump > > > > > the > > > > > receive buffer to 4M or even 8M. See > > > > > http://unbound.net/documentation/howto_optimise.html > > > > > > > > > > Otherwise a lot of queries are dropped when the cache is cold. > > > > > > > > > > I don't think there's a magic value that's right for everyone, so a > > > > > sysctl would be nice. Maybe separate ones for tcp and udp. > > > > > > > > > > I know similar sysctl's have been removed recently, and that they are > > > > > sometimes abused, but I'd say we have two valid use cases now. > > > > > > > > > > So I'd love some more discussion. :-) > > > > > > > > since they were removed, and there is this keep it simple, and too many > > > > knobs are bad attitude, which I think is not too bad, I just bumped the > > > > SB_MAX value. > > > > If there is consensus that a sysctl would make sense, I'd also look into > > > > that approach and send new patch. > > > > > > SB_MAX is there to protect your system. It gives a upperbound on how much > > > memory a socket may allocate. The current value is a compromize. Running > > > with a huge SB_MAX may make one connection faster but it will cause > > > resource starvation issues on busy systems. > > > Sure you can bump it but be aware of the consequneces (and it is why I > > > think we should not bump it at the moment). A proper change needs to > > > include some sort of resource management that ensures that we do not run > > > the kernel out of memory. > > > > But 256k simply isn't enough for some use cases. Turning this into a > > sysctl tunable like FreeBSD and NetBSD would be a good idea if you ask > > me. Yes, people will use it to shoot themselves in the foot. I don't > > care. > > So to be able to shoot myself in the foot without the need to compile the > kernel, I'll look into adding a sysctl to tweak the maximum size of the > buffer. Well, depending on time and how fast I figure out how to do that, > might take some time.
here is a first try to add such a sysctl. I called it net.inet.ip.sb-max. A better name, under a different hierarchy maybe? The default value SB_MAX defined in sys/socketvar.h did not changed. I used sysctl_int for the sysctl, but not perfectly sure whether this is right? sb_max is u_long in sys/kern/uipc_socket2.c, so maybe using sysctl_quad? Tested and works for me on i386. Its my first try in kernel land, and I'm no expert with regard to the network stack, so there may be things I should have done better. Please comment and let me know. cheers, Sebastian Index: lib/libc/gen/sysctl.3 =================================================================== RCS file: /cvs/src/lib/libc/gen/sysctl.3,v retrieving revision 1.210 diff -u -r1.210 sysctl.3 --- lib/libc/gen/sysctl.3 9 Dec 2011 16:14:54 -0000 1.210 +++ lib/libc/gen/sysctl.3 25 Dec 2011 13:50:15 -0000 @@ -1210,6 +1210,7 @@ .It ip Ta porthilast Ta integer Ta yes .It ip Ta portlast Ta integer Ta yes .It ip Ta redirect Ta integer Ta yes +.It ip Ta sb-max Ta integer Ta yes .It ip Ta sourceroute Ta integer Ta yes .It ip Ta stats Ta structure Ta no .It ip Ta ttl Ta integer Ta yes @@ -1517,6 +1518,9 @@ .Tn IP packets, and should normally be enabled on all systems. +.It Li ip.sb-max +Maximum size of socket buffers. This value is also used by the TCP +send and receive buffer autosizing algorithm. .It Li ip.sourceroute Returns 1 when forwarding of source-routed packets is enabled for the host. Index: sbin/sysctl/sysctl.8 =================================================================== RCS file: /cvs/src/sbin/sysctl/sysctl.8,v retrieving revision 1.162 diff -u -r1.162 sysctl.8 --- sbin/sysctl/sysctl.8 3 Sep 2011 22:59:08 -0000 1.162 +++ sbin/sysctl/sysctl.8 25 Dec 2011 13:50:54 -0000 @@ -228,6 +228,7 @@ .It net.inet.ip.porthilast Ta integer Ta yes .It net.inet.ip.maxqueue Ta integer Ta yes .It net.inet.ip.encdebug Ta integer Ta yes +.It net.inet.ip.sb-max Ta integer Ta yes .It net.inet.ip.ipsec-expire-acquire Ta integer Ta yes .It net.inet.ip.ipsec-invalid-life Ta integer Ta yes .It net.inet.ip.ipsec-pfs Ta integer Ta yes Index: sys/kern/uipc_socket2.c =================================================================== RCS file: /cvs/src/sys/kern/uipc_socket2.c,v retrieving revision 1.52 diff -u -r1.52 uipc_socket2.c --- sys/kern/uipc_socket2.c 4 Apr 2011 21:11:22 -0000 1.52 +++ sys/kern/uipc_socket2.c 25 Dec 2011 13:51:30 -0000 @@ -50,6 +50,7 @@ * Primitive routines for operating on sockets and socket buffers */ +extern int ip_sb_max; u_long sb_max = SB_MAX; /* patchable */ extern struct pool mclpools[]; @@ -385,6 +386,7 @@ sbreserve(struct sockbuf *sb, u_long cc) { + sb_max = ip_sb_max; if (cc == 0 || cc > sb_max) return (1); sb->sb_hiwat = cc; Index: sys/netinet/in.h =================================================================== RCS file: /cvs/src/sys/netinet/in.h,v retrieving revision 1.90 diff -u -r1.90 in.h --- sys/netinet/in.h 6 Jul 2011 01:57:37 -0000 1.90 +++ sys/netinet/in.h 25 Dec 2011 13:51:33 -0000 @@ -655,7 +655,8 @@ #define IPCTL_MRTPROTO 34 /* type of multicast */ #define IPCTL_MRTSTATS 35 #define IPCTL_ARPQUEUED 36 -#define IPCTL_MAXID 37 +#define IPCTL_SB_MAX 37 /* int: max socketbuffer size */ +#define IPCTL_MAXID 38 #define IPCTL_NAMES { \ { 0, 0 }, \ @@ -695,6 +696,7 @@ { "mrtproto", CTLTYPE_INT }, \ { "mrtstats", CTLTYPE_STRUCT }, \ { "arpqueued", CTLTYPE_INT }, \ + { "sb-max", CTLTYPE_INT }, \ } #define IPCTL_VARS { \ NULL, \ @@ -733,7 +735,8 @@ NULL, \ NULL, \ NULL, \ - &la_hold_total \ + &la_hold_total, \ + &ip_sb_max \ } /* INET6 stuff */ Index: sys/netinet/ip_input.c =================================================================== RCS file: /cvs/src/sys/netinet/ip_input.c,v retrieving revision 1.195 diff -u -r1.195 ip_input.c --- sys/netinet/ip_input.c 6 Jul 2011 02:42:28 -0000 1.195 +++ sys/netinet/ip_input.c 25 Dec 2011 13:51:39 -0000 @@ -107,6 +107,7 @@ int ip_mtudisc = 1; u_int ip_mtudisc_timeout = IPMTUDISCTIMEOUT; int ip_directedbcast = 0; +int ip_sb_max = SB_MAX; #ifdef DIAGNOSTIC int ipprintfs = 0; #endif @@ -1700,6 +1701,8 @@ #else return (EOPNOTSUPP); #endif + case IPCTL_SB_MAX: + return (sysctl_int(oldp, oldlenp, newp, newlen, &ip_sb_max)); default: if (name[0] < IPCTL_MAXID) return (sysctl_int_arr(ipctl_vars, name, namelen,