Hiya,

I've been doing some work on Squid-2.6 to optimise the parser code.
I've been working with Mark Nottingham from Yahoo! who has been running various
throughput tests on Squid-2.6.

I've been concentrating on the client-side stuff - request parsing, reply
parsing/building. There's plenty of other areas of the code which could
do with some optimising but I'm focusing on the client side for now.

The summary is:

>1k responses: from 4,900 -> ~6,000-6,700
>4k responses: from 6,500 -> ~8,000-9,000

IIRC, this is measured using a local version of httperf. My changes have been:

* don't call headersEnd() so often if it can be avoided!
* quite a large rework of the request line parser (which is unfinished but
  I believe parses RFC-compliant HTTP/1.0 & 1.1 requests fine; doesn't parse
  HTTP/0.9 requests just for now)
* Some refactoring of clientReadRequest
* Code modifications to not triple-copy the request buffer whilst parsing

Stuff I've seen which should also give a noticable performance boost in this
particular micro benchmark:

* An overhaul of HttpReply so it doesn't double-copy the reply buffer during 
parsing
  (Which will probably require rewriting the status line parser to not expect
  a NULL terminated string, much like what the request line parser was doing..)

* Rethink the Http Header stuff - a lot of the time spent in request 
parsing/reply
  building is the memory allocations and array manipulation needed to support
  HttpHeader.

* See if there's a nice way to combine the initial header write and data buffer
  into a single write(). More likely, come up with some simple way of reference
  counting some stuff to build iovec's and feed to writev().

* Hint to memPoolAlloc/memPoolFree that they shouldn't xfree() certain buffers,
  such as the buffers being allocated to strings and stmem buffers.

I've been profiling using gprof and perfsuite. Both are statistical; both give
different results. I've been using the gprof call graphs as well.

I'm using apachebench to do local testing. Here's what I use:

[EMAIL PROTECTED]:~$ ab -c 10 -n 100000 
http://192.168.3.1:3128/squid-internal-static/icons/test.4k

Squid compiled with:

[EMAIL PROTECTED]:~/work/squid/sf/parserwork$ env CFLAGS="-O2 -g -pg -ggdb 
-fno-inline-functions          \
  -fno-inline-functions-called-once --no-inline" ./configure 
--prefix="/home/adrian/work/squid/run"  \
  --enable-storeio="ufs null" --disable-unlinkd --quiet

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  3.57      0.67     0.67  4700715     0.00     0.00  memPoolFree
  3.57      1.33     0.67   200018     0.00     0.00  headersEnd
  2.73      1.84     0.51   100009     0.01     0.04  httpRequestFree
  2.20      2.25     0.41   100009     0.00     0.02  parseHttpRequest
  2.17      2.65     0.41  1000090     0.00     0.00  httpHeaderIdByName
  2.06      3.04     0.39  5901105     0.00     0.00  arrayAppend
  1.80      3.38     0.34  4701505     0.00     0.00  memPoolAlloc
  1.77      3.71     0.33  1301128     0.00     0.00  xstrncpy
  1.74      4.03     0.33   200018     0.00     0.02  clientWriteComplete
  1.69      4.34     0.32  1300117     0.00     0.00  httpHeaderEntryDestroy
  1.61      4.64     0.30  1700186     0.00     0.00  memFreeString
  1.55      4.93     0.29   320996     0.00     0.06  comm_call_handlers
  1.50      5.21     0.28   601188     0.00     0.00  xstrdup
  1.50      5.50     0.28   500056     0.00     0.00  dlinkDelete

47 memory allocation/frees, 59 array appends, 13 header entry destroys, etc.

The same test run, compiled without -pg, run under perfmon/perfsuite:

File Summary
--------------------------------------------------------------------------------
Samples   Self %  Total %  File

    365   14.43%   14.43%  
/home/adrian/work/squid/sf/parserwork/src/client_side.c
    353   13.96%   28.39%  
/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c
    180    7.12%   35.51%  /home/adrian/work/squid/sf/parserwork/src/MemPool.c
    114    4.51%   40.02%  /home/adrian/work/squid/sf/parserwork/src/comm.c
     98    3.88%   43.89%  /home/adrian/work/squid/sf/parserwork/src/mem.c
     97    3.84%   47.73%  /home/adrian/work/squid/sf/parserwork/lib/Array.c
     89    3.52%   51.25%  /home/adrian/work/squid/sf/parserwork/src/cbdata.c
     86    3.40%   54.65%  /home/adrian/work/squid/sf/parserwork/lib/util.c
     83    3.28%   57.93%  /home/adrian/work/squid/sf/parserwork/src/tools.c
     79    3.12%   61.05%  /home/adrian/work/squid/sf/parserwork/src/String.c
     68    2.69%   63.74%  
/home/adrian/work/squid/sf/parserwork/src/store_client.c
     57    2.25%   65.99%  /home/adrian/work/squid/sf/parserwork/src/acl.c

Function Summary
--------------------------------------------------------------------------------
Samples   Self %  Total %  Function

    126    4.98%    4.98%  memPoolFree
     77    3.04%    8.03%  httpHeaderGetEntry
     74    2.93%   10.95%  arrayAppend
     74    2.93%   13.88%  httpHeaderClean
     59    2.33%   16.21%  httpHeaderEntryDestroy
     56    2.21%   18.43%  headersEnd
     54    2.14%   20.56%  memPoolAlloc
     47    1.86%   22.42%  clientWriteComplete
     47    1.86%   24.28%  httpRequestFree
     46    1.82%   26.10%  memFreeString
     41    1.62%   27.72%  comm_call_handlers
     40    1.58%   29.30%  stringClean
     35    1.38%   30.68%  clientSendMoreData
     35    1.38%   32.07%  xstrncpy
     32    1.27%   33.33%  connStateFree
     30    1.19%   34.52%  dlinkDelete

Function:File:Line Summary
--------------------------------------------------------------------------------
Samples   Self %  Total %  Function:File:Line

     38    1.50%    1.50%  
httpHeaderClean:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:354
     36    1.42%    2.93%  
httpHeaderEntryDestroy:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:1193
     35    1.38%    4.31%  
httpHeaderGetEntry:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:555
     28    1.11%    5.42%  ??:??:0
     26    1.03%    6.45%  
arrayAppend:/home/adrian/work/squid/sf/parserwork/lib/Array.c:95
     25    0.99%    7.43%  
xstrdup:/home/adrian/work/squid/sf/parserwork/lib/util.c:600
     22    0.87%    8.30%  
xstrncpy:/home/adrian/work/squid/sf/parserwork/lib/util.c:680
     20    0.79%    9.09%  
httpHeaderGetEntry:/home/adrian/work/squid/sf/parserwork/src/HttpHeader.c:551
     19    0.75%    9.85%  
memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:326
     17    0.67%   10.52%  
arrayAppend:/home/adrian/work/squid/sf/parserwork/lib/Array.c:93
     16    0.63%   11.15%  
memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:317
     16    0.63%   11.78%  
arrayAppend:/home/adrian/work/squid/sf/parserwork/lib/Array.c:91
     15    0.59%   12.38%  
memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:303
     15    0.59%   12.97%  
memPoolFree:/home/adrian/work/squid/sf/parserwork/src/MemPool.c:319
     14    0.55%   13.52%  
headersEnd:/home/adrian/work/squid/sf/parserwork/src/mime.c:147

Each gives slightly different results but they're all centred around the same 
functions -
memory allocation/free, header creation/deallocation.

I'm going to stop speeding things up, complete the request/reply parser 
modifications and
concentrate on fixing any bugs that pop up. I'm not going to try writing an 
incremental
HTTP parser for now; I'll leave that for Squid-3. I'm mainly doing this to wrap 
my head
around what bits of the code are fast, what bits are slow, and why.




Adrian

Reply via email to