Hiya, I've been doing some work on Squid-2.6 to optimise the parser code. I've been working with Mark Nottingham from Yahoo! who has been running various throughput tests on Squid-2.6.
I've been concentrating on the client-side stuff - request parsing, reply parsing/building. There's plenty of other areas of the code which could do with some optimising but I'm focusing on the client side for now. The summary is: >1k responses: from 4,900 -> ~6,000-6,700 >4k responses: from 6,500 -> ~8,000-9,000 IIRC, this is measured using a local version of httperf. My changes have been: * don't call headersEnd() so often if it can be avoided! * quite a large rework of the request line parser (which is unfinished but I believe parses RFC-compliant HTTP/1.0 & 1.1 requests fine; doesn't parse HTTP/0.9 requests just for now) * Some refactoring of clientReadRequest * Code modifications to not triple-copy the request buffer whilst parsing Stuff I've seen which should also give a noticable performance boost in this particular micro benchmark: * An overhaul of HttpReply so it doesn't double-copy the reply buffer during parsing (Which will probably require rewriting the status line parser to not expect a NULL terminated string, much like what the request line parser was doing..) * Rethink the Http Header stuff - a lot of the time spent in request parsing/reply building is the memory allocations and array manipulation needed to support HttpHeader. * See if there's a nice way to combine the initial header write and data buffer into a single write(). More likely, come up with some simple way of reference counting some stuff to build iovec's and feed to writev(). * Hint to memPoolAlloc/memPoolFree that they shouldn't xfree() certain buffers, such as the buffers being allocated to strings and stmem buffers. I've been profiling using gprof and perfsuite. Both are statistical; both give different results. I've been using the gprof call graphs as well. I'm using apachebench to do local testing. Here's what I use: [EMAIL PROTECTED]:~$ ab -c 10 -n 100000 http://192.168.3.1:3128/squid-internal-static/icons/test.4k Squid compiled with: [EMAIL PROTECTED]:~/work/squid/sf/parserwork$ env CFLAGS="-O2 -g -pg -ggdb -fno-inline-functions \ -fno-inline-functions-called-once --no-inline" ./configure --prefix="/home/adrian/work/squid/run" \ --enable-storeio="ufs null" --disable-unlinkd --quiet Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 3.57 0.67 0.67 4700715 0.00 0.00 memPoolFree 3.57 1.33 0.67 200018 0.00 0.00 headersEnd 2.73 1.84 0.51 100009 0.01 0.04 httpRequestFree 2.20 2.25 0.41 100009 0.00 0.02 parseHttpRequest 2.17 2.65 0.41 1000090 0.00 0.00 httpHeaderIdByName 2.06 3.04 0.39 5901105 0.00 0.00 arrayAppend 1.80 3.38 0.34 4701505 0.00 0.00 memPoolAlloc 1.77 3.71 0.33 1301128 0.00 0.00 xstrncpy 1.74 4.03 0.33 200018 0.00 0.02 clientWriteComplete 1.69 4.34 0.32 1300117 0.00 0.00 httpHeaderEntryDestroy 1.61 4.64 0.30 1700186 0.00 0.00 memFreeString 1.55 4.93 0.29 320996 0.00 0.06 comm_call_handlers 1.50 5.21 0.28 601188 0.00 0.00 xstrdup 1.50 5.50 0.28 500056 0.00 0.00 dlinkDelete 47 memory allocation/frees, 59 array appends, 13 header entry destroys, etc. The same test run, compiled without -pg, run under perfmon/perfsuite: File Summary -------------------------------------------------------------------------------- Samples Self % Total % File 365 14.43% 14.43% /home/adrian/work/squid/sf/parserwork/src/client_side.c 353 13.96% 28.39% /home/adrian/work/squid/sf/parserwork/src/HttpHeader.c 180 7.12% 35.51% /home/adrian/work/squid/sf/parserwork/src/MemPool.c 114 4.51% 40.02% /home/adrian/work/squid/sf/parserwork/src/comm.c 98 3.88% 43.89% /home/adrian/work/squid/sf/parserwork/src/mem.c 97 3.84% 47.73% /home/adrian/work/squid/sf/parserwork/lib/Array.c 89 3.52% 51.25% /home/adrian/work/squid/sf/parserwork/src/cbdata.c 86 3.40% 54.65% /home/adrian/work/squid/sf/parserwork/lib/util.c 83 3.28% 57.93% /home/adrian/work/squid/sf/parserwork/src/tools.c 79 3.12% 61.05% /home/adrian/work/squid/sf/parserwork/src/String.c 68 2.69% 63.74% /home/adrian/work/squid/sf/parserwork/src/store_client.c 57 2.25% 65.99% /home/adrian/work/squid/sf/parserwork/src/acl.c Function Summary -------------------------------------------------------------------------------- Samples Self % Total % Function 126 4.98% 4.98% memPoolFree 77 3.04% 8.03% httpHeaderGetEntry 74 2.93% 10.95% arrayAppend 74 2.93% 13.88% httpHeaderClean 59 2.33% 16.21% httpHeaderEntryDestroy 56 2.21% 18.43% headersEnd 54 2.14% 20.56% memPoolAlloc 47 1.86% 22.42% clientWriteComplete 47 1.86% 24.28% httpRequestFree 46 1.82% 26.10% memFreeString 41 1.62% 27.72% comm_call_handlers 40 1.58% 29.30% stringClean 35 1.38% 30.68% clientSendMoreData 35 1.38% 32.07% xstrncpy 32 1.27% 33.33% connStateFree 30 1.19% 34.52% dlinkDelete Function:File:Line Summary -------------------------------------------------------------------------------- Samples Self % Total % Function:File:Line 38 1.50% 1.50% httpHeaderClean:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:354 36 1.42% 2.93% httpHeaderEntryDestroy:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:1193 35 1.38% 4.31% httpHeaderGetEntry:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:555 28 1.11% 5.42% ??:??:0 26 1.03% 6.45% arrayAppend:/home/adrian/work/squid/sf/parserwork/lib/Array.c:95 25 0.99% 7.43% xstrdup:/home/adrian/work/squid/sf/parserwork/lib/util.c:600 22 0.87% 8.30% xstrncpy:/home/adrian/work/squid/sf/parserwork/lib/util.c:680 20 0.79% 9.09% httpHeaderGetEntry:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:551 19 0.75% 9.85% memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:326 17 0.67% 10.52% arrayAppend:/home/adrian/work/squid/sf/parserwork/lib/Array.c:93 16 0.63% 11.15% memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:317 16 0.63% 11.78% arrayAppend:/home/adrian/work/squid/sf/parserwork/lib/Array.c:91 15 0.59% 12.38% memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:303 15 0.59% 12.97% memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:319 14 0.55% 13.52% headersEnd:/home/adrian/work/squid/sf/parserwork/src/mime.c:147 Each gives slightly different results but they're all centred around the same functions - memory allocation/free, header creation/deallocation. I'm going to stop speeding things up, complete the request/reply parser modifications and concentrate on fixing any bugs that pop up. I'm not going to try writing an incremental HTTP parser for now; I'll leave that for Squid-3. I'm mainly doing this to wrap my head around what bits of the code are fast, what bits are slow, and why. Adrian
