Re: [tcpdump-workers] re, linux+select

Guy Harris Thu, 04 Jan 2001 03:45:37 -0800
> >You never were able to specify a timeout in libpcap; the so-called "read
> >timeout" in libpcap has never been guaranteed to be a timeout (except
> >perhaps when libpcap ran only with BPF, but I suspect that, back then,
> >there *was* no libpcap, that code was just part of tcpdump).
> Ouch. :)
> I just trusted the manpage.

The man page probably described the way BPF works on BSD, but that's not
how other platforms worked.  (It also either incorrectly described the
way "pcap_dispatch()" works, or described the way it worked once upon a
time - it claimed, or at least implied, that a "cnt" argument of 0
caused it to keep reading packets until an error occurred, EOF is
reached (which doesn't happen on live captures) or a timeout occurs,
but, in fact, it only processed one bufferful of packets from the OS;
"pcap_loop()" is what you use if you want to keep reading packets.)

> Anyway. When it breaks on other UNIXes too, it's ok.
> I guess any UNIX can use timeout via select() b/c they support
> BSD sockets.

Yes, that's what Ethereal does (except on BSD, where "select()" doesn't
work on BPF devices, sigh - it doesn't start the timer, so the
"select()" doesn't return until the "store buffer" fills up and is moved
to the "hold buffer" and made available for reading; a workaround is to
put the descriptor in non-blocking mode and, when the "select()" times
out, read from it).

Ethereal used to do that only on Linux, but it now does it on all
non-BSD platforms; I discovered that the "read timeout" isn't a timeout
on Solaris, it's just a timer for buffering, so that you get more than
one packet per "getmsg()".

> On the other hand it might be better to let the application
> handle this if they have to multiplex inputs anyway.

That's what Etherape does, for example - it hands the libpcap file
descriptor to GTK+ to use in its main loop.

> >So any application that uses the timeout as a way of being sure it can
> >periodically poll for user input is broken.  For example, I just changed
> Sure it is, but there is other usage of timeout.
> Using pcap_next() before ensured (at least on linux as I learned now:)
> that it returns after a given timeout, making possible to say
> "there was no such packet" to the user.
> How do I handle this now?
> Obtaining the fd via fileno() and doing select(), then upon return
> calling pcap_next()?

If you want to have "pcap_dispatch()" ("pcap_next()" is just a wrapper
around "pcap_dispatch()", with a "cnt" argument of 1) return after the
timeout expires, even if no packets have arrived, that's what you've
always had to do (and what, even if we put "select()"s in *all* the
"pcap-XXX.c"s except for BPF and possibly Ultrix/Digital UNIX - I don't
know yet how its timeouts work, you would still have to do unless you
can guarantee that your application would only be used with versions of
libpcap with those "select()"s).

> >> Thus, checking for EAGAIN error is useless as well.
> >
> >Not true.  The file descriptor could be put into non-blocking mode by
> >the application; that's what the version of Ethereal that uses the GTK+
> >main loop's "poll()" does.
> Well, it's not good idea to mess with data that one doesn't own,
> i.e. setting fd's to nonblocking mode from within application.

Perhaps having a libpcap API to put the pcap_t into nonblocking mode is
the way to handle that - applications that need to arrange that
"pcap_dispatch()" not block (such as applications that do their own
"select()", or do it in, say, a GUI toolkit, and that have to work on
BSD with BPF and thus have to read from the "pcap_t" when a timeout
occurs and *not* have it block if no packets have arrived) would use
that API.

This would also be useful for either the Linux 2.4 mmapped-capture stuff
or the similar stuff that people are looking at for BSD; with those
capture mechanisms, there aren't any "read()" (or "recvfrom()") calls
done, libpcap would, in order to wait for a packet to arrive, do a
"select()" and would then fetch the packets from the memory-mapped
buffer.  A "non-blocking" "pcap_dispatch()" would, in that case, avoid
doing the "select()".

With such an API, "pcap-linux.c" could do the "select()" if the "pcap_t"
weren't in non-blocking mode (note that the mode of the "pcap_t"
wouldn't necessarily be the mode of the underlying file descriptor; the
underlying file descriptor could be in non-blocking mode regardless of
whether the "pcap_t" was in non-blocking mode, for example) and skip it
if it were in non-blocking mode.

You'd still have to put "select()"s into most of the other "pcap-XXX.c"
files, though, in order to allow portable applications to rely on the
read timer being a timeout that expires even if no packets have arrived.

Part of the problem appears to be that read timeouts serve two
purposes:

        1) at least on some platforms, it's intended to cause more than
           one packet to be returned when reading from the capture
           device - a "read()"/"getmsg()" on the capture device won't
           return the instant a packet arrives, but will wait some
           amount of time to allow more packets to arrive;

        2) on some platforms, it causes the read to return after the
           timer expires even if no packets have arrived.

On BSD (observed behavior) and Digital UNIX (according to the
documentation), it serves purpose 2), and serves purpose 1) as well
(perhaps not as well as it could if it didn't also serve purpose 2),
because the timer starts when the read is done, not when the first
packet arrives, although if there's a lot of traffic, which is the case
where the batching is most helpful, a packet will probably arrive
shortly after the read is done).

On SunOS 5.7 (and probably all other 5.x releases; the 5.x "bufmod" is
probably derived from the 4.x "bufmod", so 4.x may have behaved in the
same fashion), it appears *not* to serve purpose 2), both from
experience and from the man page:

        To ensure that messages do not languish forever in an
        accumulating chunk, bufmod maintains a read timeout.  Whenever
        this timeout expires, the module closes off the current chunk
        and passes it upward.  The module restarts the timeout period
        when it receives a read side data message and a timeout is not
        currently active.  These two rules insure that bufmod minimizes
        the number of chunks it produces during periods of intense
        message activity and that it periodically disposes of all
        messages during slack intervals, but avoids any timeout overhead
        when there is no activity.

(i.e., the timeout starts when a packet arrives, which is when the
"bufmod" STREAMS module will get a read-side data message).  It does,
however, serve purpose 1).

On other platforms, the packet capture mechanism in the OS doesn't
provide any timeout mechanism, so it would have to be implemented with,
say, "select()"; this would serve purpose 2), but not purpose 1).

Some applications that use the underlying mechanism to implement network
protocols in userland (e.g., the reverse ARP daemon on SunOS 4.x and
5.x, various BSDs, and possibly other OSes) don't need either behavior -
the RARP daemon has no reason to wake up until a packet arrives, so they
don't need 2), and they probably don't get enough traffic to require 1). 
Those applications probably don't set a timeout - and, on BSD, would
also have to set "immediate mode", as the BSD BPF mechanism won't supply
packets until either the buffer fills up or a timeout occurs, unless the
BPF device is in "immediate mode":

    BIOCIMMEDIATE  (u_int) Enable or disable ``immediate mode'', based on the
                    truth value of the argument.  When immediate mode is en-
                    abled, reads return immediately upon packet reception.
                    Otherwise, a read will block until either the kernel
                    buffer becomes full or a timeout occurs.  This is useful
                    for programs like rarpd(8) which must respond to messages
                    in real time.  The default for a new file is off.

Some applications that do packet capture don't need 2) - tcpdump, for
example, has nothing it needs to do periodically even if no traffic
arrives.  Those applications might work better with 1); if so, they're
just out of luck on platforms that don't provide any timeout in the OS's
packet capture mechanism.

Some applications that do packet capture also allow user interaction
while they're doing a capture - Ethereal and Etherape, for example. 
They could use 2) to implement that, by polling for user input, but they
could also do that by having a main loop that does a "select()" (as both
of those do, given that they're both X applications) and select on the
pcap_t's file descriptor as well as the connection to the X server.

Other applications might well actually want to know when there's no
traffic on the network, and thus would want 2) as a way of detecting
that; the paper "Implementing A Generalized Tool For Network Monitoring"
by various folks at Network Flight Recorder (hi, Mike, I know you're out
there listening :-)), at

        http://www.nfr.com/forum/publications/LISA-97.htm

says:

        The packet suckers we initially implemented have been based on
        the libpcap[11] packet capture interface.  Libpcap provides a
        generalized packet capture facility atop a number of operating
        system-specific network capture interfaces.  This freed us from
        having to deal with a lot of portability issues.  We did
        discover, however, that some of the available packet capture
        facilities cannot reliably buffer high volumes of bursty
        traffic.  Berkeley packet filter-based packet suckers running on
        a Pentium-200 were unable to handle even moderate network loads. 
        This was a result of a latency interaction between BPF and our
        software: we do more processing than a program like tcpdump,
        and, though our average processing seems to be within the
        performance envelope of the machine, we cant always process the
        packet "immediately," as BPF expects.  To fix the problem, we
        increased internal buffer sizes from their default of 32K to
        256K, a number more appropriate for the amount of RAM available
        in modern computers.  Since the NFR daemon potentially monitors
        multiple interfaces, we performed minor modifications to the way
        blocking and time-outs are performed in BPF.  The original BPF
        time-out is an inter-packet time-out based in the arrival of a
        packet.  If you dont see a packet, you never time out.  We
        modified it to begin the timer with the read() or select()
        timeout, so we can detect periods of no traffic.

I'm not sure to which version of the BPF code

        The original BPF time-out is an inter-packet time-out based in
        the arrival of a packet.  If you dont see a packet, you never
        time out.

is referring; the FreeBSD "sys/net/bpf.c", as of delta 1.1, appears to
implement the same "timer starts when the read starts and, when it
expires, the read finishes, even if there are no packets available"
semantics that BPF currently implements, as does the NetBSD
"sys/net/bpf.c" - neither of those appear to have a timeout that's "an
inter-packet time-out based in the arrival of a packet", where "If you
dont see a packet, you never time out."  (Perhaps I'm missing something
- Mike?)

Now, those versions *do* have a bug where the timer doesn't start with a
"select()", which is the BSD BPF bug I referred to in

        (except on BSD, where "select()" doesn't work on BPF devices,
        sigh - it doesn't start the timer, so the "select()" doesn't
        return until the "store buffer" fills up and is moved to the
        "hold buffer" and made available for reading; a workaround is to
        put the descriptor in non-blocking mode and, when the "select()"
        times out, read from it)

above.  OpenBSD has changes from NFR that fix that bug (and boosts the
size of the buffer; FreeBSD has the bigger buffer as well, but not the
"select()" fix).

I'm also curious why this matters for monitoring multiple interfaces,
unless NFR does it with "select()"s on multiple BPF devices and thus
requires BPF to work correctly with "select()".  I'm also not sure what
"[detecting] periods of no traffic" has to do with that.  (Mike?)
-
This is the TCPDUMP workers list. It is archived at
http://www.tcpdump.org/lists/workers/index.html
To unsubscribe use mailto:[EMAIL PROTECTED]?body=unsubscribe
Re: [tcpdump-workers] re, linux+select

Reply via email to