Re: [tcpdump-workers] libpcap for linux, to_ms redefined

Guy Harris Wed, 18 Sep 2002 12:51:23 -0700

On Thu, Mar 28, 2002 at 09:45:07PM -0700, Phil Wood wrote:
> With the advent of memory mapped ring buffers developed by Alexey Kuznetsov,
> this function could be accomodated.  I treat the value of 'to_ms' in the
> following manner:
> 
>   if (to_ms == 0) return; // if no packet immediately available then return
>                   // to calling program it will poll (good for old
>                   // versions of NFR or programs that have other
>                   // things to do besides capture packets)


And bad for compatibility with other platforms, on which a "to_ms" value
of 0 means "if no packet immediately available, block, and wait as long
as necessary for enough packets to arrive to fill up a chunk".

There are several timeout behaviors that can be provided by various
platforms' native packet capture mechanisms that support timeouts:

    BSD:

        With a non-zero timeout, a read will return if either

                1) enough data arrives to fill up the buffer

        or

                2) the timeout expires, even if no data has arrived.

        With a zero timeout, the read will return only if enough data
        arrives to fill up the buffer, blocking as long as is necessary.

        You can do BIOCIMMEDIATE to cause packets to be delivered as
        soon as they arrive; if combined with a timeout, that *probably*
        means that a read will return if either

                1) a packet arrives

        or

                2) the timeout expires, even if no data has arrived.

    Digital UNIX:

        With a positive timeout, at least as I read the man page, it
        might be the case that, with a non-zero timeout, a read will
        return if either

                1) a packet arrives

        or

                2) the timeout expires, even if no data has arrived

        so that batching is presumably done only if packets are arriving
        faster than the application can read them one at a time - i.e.,
        if, before the read wakes up and copies data to userland, more
        packets arrive.  I don't know whether that's the case, however.

        With a zero timeout, the read will return if a packet arrives,
        blocking as long as is necessary.

        With a negative timeout, the read will return immediately, even
        if no packet is available; the value of the timeout is, I infer,
        ignored.

        They say that BIOCIMMEDIATE has no effect as immediate mode
        is always on, which is why I infer that batching isn't done
        BSD-style.

    Windows with WinPcap:

        With a positive timeout, a read will return if either

                1) enough data arrives to fill up the buffer

        or

                2) the timeout expires, even if no data has arrived

        at least as I read the current packet.dll documentation.  The
        bufffer size is set with "PacketSetMinToCopy()" on Windows NT
        (NT 4.0, W2K, WXP, W.NETServer); I'm guessing from what the
        documentation says about Windows OT (95/98/Me) that one packet
        is always enough to fill up the buffer on those OSes.

        With a zero timeout, a read will return only if enough data
        arrives to fill up the buffer, blocking as long as is necessary.

        With "PacketSetMinToCopy()" you can presumably get the
        equivalent of BIOCIMMEDIATE.

    SunOS 5.x:

        With a non-zero timeout, a read will return if either

                1) enough data arrives to fill up the buffer

        or

                2) the timeout expires *AND* at least one packet has
                   arrived.  (Yes, this means that you can't use the
                   timeout to break out of a loop and do something else
                   while you're waiting.  Such is life.)

        With a zero timeout, a read will return as soon as a packet
        arrives.

        With the timeout cleared, a read will return only if enough data
        arrives to fill up the buffer, blocking as long as is necessary.
        (libpcap treats a "to_ms" value of 0 as meaning "don't set the
        timeout", which means it's cleared, *not* as "return
        immediately".)

        With a cl

    SunOS 4.x:

        I don't have the "bufmod" man page handy for SunOS 4.x, but I
        *suspect* it's similar to 5.x, as the 5.x "bufmod" is probably
        derived from the 4.x "bufmod".  No guarantees, however.

    SunOS 3.x:

        *Probably* behaves like 4.x.

On OSes whose packet capture mechanism *doesn't* support timeouts, a
read will return if a packet arrives, and will wait indefinitely for
that to happen.

All this means that libpcap cannot, merely by using the underlying OS's
mechanisms:

        guarantee that a read will always return within a certain
        timeout period (the Solaris timeout mechanism only sets the
        timeout for batching of packets);

        guarantee that packets will not be delivered ASAP (some OSes
        don't do batching, and others do only a limited amount of
        batching).

On most if not all of the OSes, however, you *can* do a "select()" or
"poll()" or "WaitFor...()" call on the pcap device/socket/whatever, so
that you *can* multiplex reading packets and doing other things.  (On
BSD, you may have to use a timeout in the "select()" or "poll()", plus
non-blocking I/O, as "select()" or "poll()" on BPF devices doesn't
always work correctly.)

It would be possible, if people *really* insist on using the timeout for
multiplexing rather than just batching, to make

        1) the platforms with no timeout in the OS (Linux, Irix, HP-UX,
           etc.)

and

        2) the platforms where the timeout can't be used for
           multiplexing as the timer doesn't expire unless you have at
           least one packet (SunOS 5.x)

do a combination of non-blocking I/O and a "select()" or "poll()" with a
timeout.  However

        1) that's overkill for applications that *don't* use the timeout
           for multiplexing

and

        2) it means that people will start relying on it and having all
           sorts of weird problems when their application is run with an
           older version of libpcap

so I'm not strongly inclined to implement that (which is why the Linux
version doesn't do it) and, if we do implement that, I'd want to do it
only if

        1) we do it on *ALL* the platforms where the native OS timeout
           can't be used for multiplexing (not just for Linux)

and

        2) we add a new API to enable that mode, so that applications
           that require that mode have to use the new API and thus won't
           build with older versions of libpcap (rather than merely
           hanging forever, on platforms such as Linux where there is no
           timeout and platforms such as SunOS 5.x where the timeout
           doesn't expire if no packets arrive, with older versions of
           libpcap).

I would also advocate adding a new API to get "immediate mode", that
being the mode where a read completes as soon as a packet arrives, with
no batching; that'd let us hide all the details of how to request
immediate mode.
-
This is the TCPDUMP workers list. It is archived at
http://www.tcpdump.org/lists/workers/index.html
To unsubscribe use mailto:[EMAIL PROTECTED]?body=unsubscribe

Re: [tcpdump-workers] libpcap for linux, to_ms redefined

Reply via email to