Brandon, For curiousity sake (I have a simular app and am seriously interested in performance).
- What platform (OS/processor) are you on, - How did you measure the time to call recvfrom(), or perhaps even a more relevant question is how do you use recvfrom() (whats the surrounding code like, do you select() first, etc..)? I ask because I'm seeing substantially better numbers for recvfrom (0.021ms), although granted it is for a fairly short message but that doesn't explain the 1000X performance delta. Recvfrom itself is relatively cheap, select() is VERY expensive in comparison ( <100K clock cycles versus >23M clock cycles for select, or ~0.021ms for recv from versus ~9.7ms on a 2.4Ghz Xenon for select). Note also that this is for an empty select set with 0 timeout, so that is JUST the calling overhead, if its checking fd's it will be greater. If your on a slower platform (embedded perhaps given your company?) the numbers will be obviously be worse, recvfrom seems to scale almost directly proportional to clock speed. Based on the numbers I've seen I would expect recvfrom to be at LEAST as fast if not faster than libpcap since libpcap (often) uses select() (on some platforms :) to check if the capture device has data ready. Libpcap will benefit somewhat from its ability to bundle multiple packets into a buffer (on platforms that support that), but not I suspect as much to make up a ~250X performance delta. I'm also going to agree with Guy. If you have checksum problems and are still seeing the packets you are doing something I would seriously like to find out how you accomplished. More likely you shouldn't see the packets at all. Below is a short sample of results and a short overview of my methodology. This is on a single CPU 2.4Ghz Xenon running RH 9.0 with a stock 2.4.20 kernel. rdtsc: 1009667286648918 - 1009667286584434 = 64484 ( / one_second) = 0.000021 size 36 rdtsc: 1009679499010456 - 1009679498987684 = 22772 ( / one_second) = 0.000007 size 36 rdtsc: 1009691711003018 - 1009691710985706 = 17312 ( / one_second) = 0.000006 size 36 rdtsc: 1009703923467532 - 1009703923448944 = 18588 ( / one_second) = 0.000006 size 36 rdtsc: 1009716135791420 - 1009716135773620 = 17800 ( / one_second) = 0.000006 size 36 rdtsc: 1009752772553556 - 1009752772447452 = 106104 ( / one_second) = 0.000035 size 36 rdtsc: 1009764984789778 - 1009764984748314 = 41464 ( / one_second) = 0.000014 size 36 rdtsc: 1009777197311290 - 1009777197270442 = 40848 ( / one_second) = 0.000013 size 36 rdtsc: 1009789410262486 - 1009789410220774 = 41712 ( / one_second) = 0.000014 size 36 rdtsc: 1009801622233478 - 1009801622192834 = 40644 ( / one_second) = 0.000013 size 36 rdtsc: 1009813840971578 - 1009813840931126 = 40452 ( / one_second) = 0.000013 size 36 rdtsc: 1009826046989322 - 1009826046959894 = 29428 ( / one_second) = 0.000010 size 36 rdtsc: 1009838259554966 - 1009838259526818 = 28148 ( / one_second) = 0.000009 size 36 rdtsc: 1009850472114698 - 1009850472085270 = 29428 ( / one_second) = 0.000010 size 36 Here is a code snippet that shows how I do my timings: while(1) { // We wouldn't actually use select in the real app // we're using it here to make sure we're timing the // call the recvfrom for a live socket instead of // counting the time recvfrom blocks waiting for a packet FD_ZERO(&rfds); FD_SET(recv_s, &rfds); select(recv_s +1, &rfds, NULL, NULL, NULL); // time the actual recvfrom call rdtsc_ret[index] = rdtsc(); index = !index; size = recvfrom(recv_s, buf, sizeof(buf), 0, NUL, NUL); rdtsc_ret[index] = rdtsc(); printf("rdtsc: %lld - %lld = %lld ( / one_second) = %f size %d\n", rdtsc_ret[index], rdtsc_ret[!index], rdtsc_ret[index] - rdtsc_ret[!index], (double)(rdtsc_ret[index] - rdtsc_ret[!index])/(double) one_second, size); } rdtsc is defined as an assembly snippet that reads the processor clock register on I386 achitectures. Other architectures are obviously different. The overhead for calling {rdtsc(); index = !index; rdtsc(); } is 84-96 clock cycles so I just ignored it for this since its way less than the noise. extern __inline__ unsigned long long int rdtsc() { unsigned long long int x; __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x)); return x; } One second is defined as the real processor clock speed in HZ. You need to figure this out (dmesg, on linux cat /proc/cpuinfo, etc??) double one_second = 3050.905*1000000; We use rdtsc because gettimeofday doesn't have enough resolution to accurately measure a single call like this (resolution on the same machine as above for gettimeofday is slightly worse than 1ms). On Sun, May 23, 2004 at 06:37:40PM -0700, Brandon Stafford wrote: > Hello, > > I'm writing a server that captures UDP packets and, after some manipulation, > sends the data out the serial port. Right now, I'm using recvfrom(), but it takes 20 > ms to execute for each packet captured. I know that tcpdump can capture packets much > faster than 20 ms/packet on the same computer, so I know recvfrom() is running into > trouble, probably because of bad checksums on the packets. > > Is it a good idea to rewrite the server using pcap, or is this likely to slow me > down even more? > > Thanks, > Brandon > > > - > This is the tcpdump-workers list. > Visit https://lists.sandelman.ca/ to unsubscribe. -- >-=-=-=-=-=-=-<>-=-=-=-=-=-<>-=-=-=-=-=-<>-=-=-=-=-=-<>-=-=-=-=-=-=-< Ryan Mooney [EMAIL PROTECTED] <-=-=-=-=-=-=-><-=-=-=-=-=-><-=-=-=-=-=-><-=-=-=-=-=-><-=-=-=-=-=-=-> - This is the tcpdump-workers list. Visit https://lists.sandelman.ca/ to unsubscribe.