Cheers,
Aram
Terry Lambert wrote:
Aram Compeau wrote:Too bad he's sick of networking. There a lot of intersting code
that could be implemented in the main line FreeBSD that would be
really beneficial, overall.Could you elaborate briefly on what you'd like to see worked on with
respect to this? I don't want you to spend a lot of time describing
anything, but I am curious. I don't generally have large blocks of spare
time, but could work on something steadily with a low flame.
---
LRP
---
I would like FreeBSD to support LRP (Lazy Receiver Processing),
an idea which came from the Scala Server Project at Rice
University.
LRP gets rid of the need to run network processing in kernel
threads, in order to get parallel operation on SMP systems;
so long as the interrupt processing load is balanced, it's
possible to handle interrupts in an overlapped fashion.
Right now, there are four sets of source code: SunOS 4.1.3_U1,
FreeBSD 2.2-BETA, FreeBSD 4.0, FreeBSD 4.3. The first three
are from Rice University. The fourth is from Duke University,
and is a port forward of the 4.0 Rice code.
The Rice code, other than the FreeBSD 2.2-BETA, is unusable.
It mixes in an idea called "Resource Containers" (RESCON),
that is really not very useful (I can go into great detail on
this, if necessary). It also has a restrictive license. The
FreeBSD 2.2-BETA implementation h as a CMU MACH style license
(same as some FreeBSD code already has).
The LRP implementation in all these cases is flawed, in that
it assumes that the LRP processing will be universal across
an entire address family, and the experimental implementation
loads a full copy of the AF_INET stack under another family
name. A real integration is tricky, including credentials on
accept calls, an attribute on the family struct, to indicate
that it's LRP'ed, so that common subsystems can behave very
differently, support for Accept filters and othe Kevents, etc.).
LRP gives a minimum of a factor of 3 improvement in connections
per second, without the SYN cache code involved at all, through
an overall reduction in processing latency. It also has the
effect of preventing "receiver livelock".
http://www.cs.rice.edu/CS/Systems/LRP/
http://www.cs.duke.edu/~anderson/freebsd/muse-sosp/readme.txt
----------------
TCP Rate Halving
----------------
I would like to see FreeBSD support TCP Rate Halving, and idea
from the Pittsburgh Cupercomputing Center (PSC) at Carengie
Mellon University (CMU).
These are the people who invented "traceroute".
TCP Rate halving is an alternative to the RFC-2581 Fast Recovery
algorithm for congestion control. It effectively causes the
congestion recovery to be self-clocked by ACKs, which has the
overall effect of avoiding the normal burstiness of TCP recovery
following congestion.
This builds on work by Van Jacobsen, J. Hoe, and Sally Floyd.
Their current implementation is for NetBSD 1.3.2.
http://www.psc.edu/networking/rate_halving.h tml
---------------
SACK, FACK, ECN
---------------
Also from PSC at CMU.
SACK and FACK are well known. It's annnoying that Luigi Rizzo's
code from 1997 or so was never integrated into FreeBSD.
ECN is an implementation of Early Congestion Notification.
http://www.psc.edu/networking/tcp.html
----
VRRP
----
There is an implementation of a real VRRP for FreeBSD available;
it is in ports.
This is a real VRRP (Virtual Router Redundancy Protocol), not
like the Linux version which uses the multicast mask and thus
loses multicast capability.
There are intersting issues in actual deployment of this code;
specifically, the VMAC that needs to be used in order to
logically seperate virtual routers is not really implemented
well, so there are common ARP issues.
There are a couple of projects that one could take on here; by
far, the most interesting (IMO) would be to support multiple
virtual network cards on a single physical network card. Most
of the Gigabit Ethernet cards, and some of the 10/100Mbit cards,
can support multiple MAC addresses (the Intel Gigabit card can
support 16, the Tigon III supports 4, and the Tigone II supports
2).
The work required would be to support the ability to have a
single driver, single NIC, multiple virtual NICs.
There are also interesting issues, like being able to selectively
control ARP response from a VRRP interface which is not the
master interface. This has intersting implications for the
routing code, and for the initialization code, which normally
handles the gratuitous ARP. More information can be found in
the VRRP RFC, RFC-2338.
----------
TCP Timers
----------
I've discussed this before in depth. Basically, the timer code
is very poor for a large nu mber of connections, and increasing
the size of the callout wheel is not a real/reasonable answer.
I would like to see the code go back to the BSD 4.2 model, which
is a well known model. There is plenty of prior art in this area,
but the main thing that needs to be taken from the BSD 4.2 is per
interval timer lists, so that the list scanning, for the most part,
scans only those timers that have expired (+ 1). Basically, a
TAILQ per interval for ficed interval timers.
A very obvious way to measure the performance improvement here is
to establish a very large number of connections. If you have 4G
of memory in an IA32 machine, you should have no problem getting
to 300,000 connections. If you really work at it, I have been
able to push this number to 1.6 Million simultaneous connections.
---------------
SMP Safe Queues
---------------
For simple queue types, it should be possible to make queueing
and dequi ng an intrinsically atomic operation.
This basically means that the queue locking that is being added
to make the networking code SMP safe, is largely unnecessary,
and is caused solely by the fact that the queue macros themselves
are not being properly handled through ordering of operations,
rather than being locked around.
In theory, this is also possible for a "counted queue". A
"counted queue" is a necessary construct for RED queueing,
which needs to maintain a moving average for comparison to
the actual queue depth, so that it can do RED (Random Early
Drop) of packets.
---
WFS
---
Weighted fair share queueing is a method of handling scheduling
of processes, such that the kernel processing.
This isn't technically a networking issue. However, if the
programs in user space which are intended to operate (or, if
you are a 5.x purist, the kernel threads in kernel space, if
you pull an Ingo Mollnar an d cram everything that shouldn't
be in the kernel, into the kernel) do not remove data from
the input processing queue fast enough, you will still suffer
from receiver livelock. Basically, you need to be able to
run the programs with a priority, relative to interrupt processing.
Some of the work that Jon Lemon and Luigi Rizzo have done in
this area is interesting, but it's not sufficient to resolve
the problem (sorry guys). Unfortunately, they don't tend to
run their systems under breaking point stress, so they don't
see the drop-off in performance that happens at high enough
load.To be able to test this, you would have to have a lab
with the ability to throw a large number of clients against
a large number of servers, with the packets transiting an
applicaiton in a FreeBSD box, at close to wire speeds. We
are talking at least 32 clients and servers, unless you have
access to purpose built code (it's easier to just throw the
machines at it, and be done with it).
---
---
---
Basically, that's my short list. There are actually a lot more
things that could be done in the networking area; there are things
to do in the routing area, and things to do with RED queueing, and
things to do with resource tuning, etc., and, of course, there's
the bugs that you normally see in the BSD stack only when you try
to dothings like open more than 65535 outbound connections from a
single box, etc..
Personally, I'm tired of solving the same problems over and over
again, so I'd like to see the code in FreeBSD proper, so that it
becomes part of the intellectual commons.
-- Terry
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message