Re: [tcpdump-workers] bpf/pcap performance

Darren Reed Mon, 12 Apr 2004 16:48:55 -0700

In some email I received from Guy Harris, sie wrote:
> 
> On Apr 12, 2004, at 2:25 AM, Darren Reed wrote:
> 
> > In some email I received from Guy Harris, sie wrote:
> >> On Sun, Apr 11, 2004 at 03:15:30AM +1000, Darren Reed wrote:
> >>> And there's also BPF_MAXBUFFERSIZE.  I see pcap_getbuff() as being
> >>> essential to getting code to work without trial and error by passing
> >>> different sizes to read() to find out what the right size to read
> >>> is, if you're not setting the size yourself.
> >>
> >> But if you're using libpcap, you're not passing anything to read(),
> >> you're letting libpcap do that.
> >
> > Not necessarily.
> >
> > The interface exposed by libpcap is not conducive to good use by
> > C++ applications - main culprit here is pcap_dispatch() but none
> > of the others really help.  Unless all your classes are static
> > classes (which kind of defeats the purpose, in my book.)
> >
> > So whilst pcap is good for discovering interfaces and setting up the
> > sniffing, I've been forced to use pcap_fileno() and read(2) in order
> > to get the application design I want.  Using pcap_next() is too slow
> > on BSDs where you can get multiple packets per read
> 
> So that presumably also applies to "pcap_next_ex()" - which was 
> introduced on a platform (Win32/WinPcap) where you can get multiple 
> packets per read.


The problem with pcap_next_ex() is the man page description:
"reads the next packet and returns a success/failure indication"

This is the first problem.  I don't really know what it does,
so how can I use it properly ?

btw, on examining the code for pcap more, I find a disturbing
name problem: pcap.h (the external interface) documents pcap_pkthdr
as the structure used in the dump files, except if you look in
savefile.c, it uses pcap_sf_pkthdr instead.  To me, this is around
the wrong way, at best.  I understand what is trying to be done
here and that is to make sure the saved file always has the same
format regardless of what is coming in to the pcap_dump() function.
But in my opinion, pcap_dump() should be using pcap_pkthdr to store
things going out (not pcap_sf_pkthdr - do away with this structure).

> > (on Linux, it is fine.)
> 
> ....but is "read()" fine on Linux? :-)

Ok, the Linux code continues to use the pcap(3) API :)

> I.e., if you're using "read()", you're already not portable, unless you 
> are, in effect, re-implementing the dispatch loop for all platforms.  
> If your code knows what platform it's using, it could also know to do 
> the BIOCGBLEN ioctl.

In this case, the code is only targetted at the BSD platforms,
so we're not concerned about it not working on Solaris, AIX, etc.

> > Also, I don't believe that the interfaces provided by pcap provide for
> > a slimmed down read execution path.
> 
> What different interfaces would you suggest?

Some of these are light enough weight that they may be inline
functions or even just #defines.

I'm thinking on my feet here, so please don't expect this to be
a perfect API set.  I suppose what I might like to see is:

int pcap_read(pcap_t *)
- returns -1 for failure else the number of bytes newly stored.
  Reads as much data as is available into a contiguous buffer.
  pcap(3) is responsible for ensuring that it is able to properly
  do this.

char *pcap_databuffer(pcap_t *)
  Returns a pointer to the current read buffer where data
  from pcap_read() is held.  If pcap_read() has not been
  called or has returned 0/-1, then it should return NULL.

int pcap_datalength(pcap_t *)
  Returns the number of bytes available in the current read buffer.
  If pcap_read() has not been called, return 0.

struct pcap_pkthdr *pcap_nextrecord(pcap_t *)
  Successive calls iterate through the buffer of data read,
  returning a pointer to the start of the pcap packet header
  and the packet immediately following it.  When the data
  buffer has been exhausted, return NULL.  If pcap_read() has
  not been called or has returned -1/0, then return NULL.

int pcap_setreadbuffer(pcap_t *, const char *buffer, const int size)
  Provide an external source for pcap to use for reading data
  into.  "size" must be at least the same as what BIOCGBLEN will
  return.

I haven't specified, here, whether or not successive pcap_read()'s
overwrite or append to the buffer.  The implementation may choose, for
example, to allocate a contiguous buffer of 2*BIOCGBLEN but only ever
return upto BIOCGBLEN bytes as valid, allowing successive calls to
pcap_read() to work until a complete buffer of bytes has been read.

I added pcap_setreadbuffer() because an application may be doing its
own buffer/memory management and may wish to influence where the data
is read into for performance reasons.  It may, for example, want to
read the data straight into a shared memory segment or into a region
of memory that overlays a file via mmap.  Another alternative to this
might be to allow a user to supply their own buffer allocation callback,
but this suffers from problems already experienced with pcap_dispatch()
and C++ classes.

What I haven't said here and what may be worthwhile saying, is how
each of the above interacts with pcap_next(), pcap_next_ex() and
pcap_dispatch().  For example, whether or not you are "allowed" to
call pcap_setreadbuffer() whilst in the middle of a pcap_dispatch()
callback. 

Oh, there's no code to back up any of the above, just thoughts :)

Hopefully all of these functions can easily work on both "live" and
"dead" pcap data sources.

> > Well then, maybe we need to give some thought to what we would
> > like in terms of useful BPF device behaviour, including being
> > able to "turn the tap off" (without closing) or similar ?
> 
> ....and some thought to whether the "open" and "turn the tap on" split 
> can be implemented on other platforms.

That's not something I'd considered...

Do we have any feeling for whether or not the likes of IBM are open
to taking changes that enhance BPF whilst maintainng backward
compatibility ?  I've not given it any deep thought for how it
would impact other platforms that use DLPI, for example.

Darren
-
This is the tcpdump-workers list.
Visit https://lists.sandelman.ca/ to unsubscribe.

Re: [tcpdump-workers] bpf/pcap performance

Reply via email to