Also, I know that some people have had issues with CONFIG
proxy.config.http.share_server_sessions, you might try changing that
config value to 0 and seeing if that improves performance.

Brian



On 1/25/12 6:58 PM, "Brian Geffon" <[email protected]> wrote:

>Hi James,
>Can you try the patch in TS-1074
>(https://issues.apache.org/jira/browse/TS-1074) to see if that
>resolves your problem.
>
>
>Brian
>
>On Jan 25, 2012, at 6:36 PM, "Whitaker, James X. -ND"
><[email protected]> wrote:
>
>> Hi,
>>
>> Apologies for the length, but this is a strange one.
>>
>> The one thing that might be suspicious: I have a VM for development.  My
>> development environment is VMWare on a mac (i7 MacBookPro), with CentOS
>> 5.7 as the guest OS that hosts ATS.
>>
>> I have scoured over every line of the ATS source code in the last few
>> days, to educate myself and to attempt to answer the following question:
>>
>> Why is every single request going to ATS taking so long to complete?
>> Callgrind and cachegrind shows approximately one millisecond of CPU time
>> for one particular example involving a cache hit, but wireshark is
>>showing
>> response times that range from 13ms to 130ms or more for end-to-end
>> processing of cache hits, with an average of 50ms.  ATS is apparently
>> going to sleep for a minimum of 10 milliseconds, typically 50 to 80
>> milliseconds, and sometimes as long as 20 seconds (when under load)
>>before
>> wireshark shows ATS issuing either a cached response or a request to a
>> service that is queued up by our plugin.
>>
>> As near as I can tell, ATS is putting each and every request to sleep,
>> rather than just blasting it through.
>>
>> There are very respectable throughput numbers, but the time for each
>> request is stunningly slow.
>>
>> This is absolutely baffling.
>>
>> There is no I/O pending, no logging, no peer caches active...  nothing
>> else happening.   No other polling processing, no other requests in the
>> pipeline other than those issued by my ruby script that sends them out
>>one
>> at a time.  I start it up cold (cleared cache, rebooted machine, the
>> works) and have a 512MB ramcache configured.... with 5Gig of RAM
>>allocated
>> to the VM.  I shut off all logging, all peer caching... anything that
>>can
>> hit the disks or introduce any kind of dependency, and tell it to serve
>>up
>> a 100 byte chunk of cached data....
>>
>> I tried recompiling with EAGER_SIGNALLING and other such possibilities,
>> and they ALL increased the latency/lag.
>>
>> A week ago, I was seeing about 5% of requests completing in an
>>acceptable
>> 2.2 milliseconds, but in the last week, the fastest time for processing
>>a
>> cache hit, on an ATS server being polled by my ruby script that I have
>> seen is 12.2 milliseconds.  The shortest time showed in wireshark for
>> issuing an http request queued up by our plugin is 36ms after the
>>initial
>> GET coming into ATS, but this delay is usually around 130 milliseconds.
>> I
>> would expect to see half a millisecond.  When the request comes back
>>over
>> the wire from the http request instigated by our plugin, ATS issues the
>> expected reply to the client in less than a millisecond with no apparent
>> sleep state intervening.
>>
>> I will run back to back requests from my polling ruby script, and the
>>CPU
>> usage level is so low it sometimes does not show up in ³top².
>>
>> The AVERAGE response time is 40ms for a (I would think would be)
>> ram-cached request I would expect the server to handle in 1ms.  I have
>> thrown hundreds of polling processes and threads at it, and it's
>>responses
>> per second and overall throughput is quite respectable....  there is
>>just
>> this bizarre inexplicable latency where ATS seems to be going to sleep
>>on
>> every single request before doing anything with it, for absolutely no
>> reason.
>>
>> I tried disabling accept threads by setting it to zero in
>>records.config.
>> I tried every combination of thread counts and ratios I could think of.
>>
>> Looking over the iocore/eventsystem/UnixEThread.cc I notice that it is
>> apparently dequeuing from one queue and putting events on another queue
>> with a TIME TO EXECUTE marker flag associated with that command, and
>>this
>> is a priority queue that is sorted into bins.  Since the contents of
>>these
>> priority queues are time based, with nanosecond granularity they are
>> changing radically every millisecond; trying to figure out what is in
>>them
>> at any point in time is not proving very easy...  debuggers are not
>> cooperating with me.  This code, and nanosecond granularity strikes me
>>as
>> really odd.
>>
>> The only thing I can think of is that their is a clock
>>granularity/jitter
>> problem between the clock samples.  ATS code is sampling the time of day
>> clock, causing the re-enqueued priority queue items to suddenly get an
>> execution time stamp (that is supposed to be a hundred nanoseconds or so
>> in the past  the time I would estimate that it takes to process the
>> re-enquing of the priority queue items).  I suspect this time stamp
>>might
>> have jitter in it causing items just pulled off the queue to go to sleep
>> waiting for their ³time² to come around.  This might be a CentOS
>>problem,
>> it might be a VMWare clock problem, it might be a Mac clock problem...
>>it
>> might be a VMWare buffering the time of day clock problem (that seems
>>the
>> most likely to me)....  One thought that comes to mind is that VMWare
>> might be the cause of all the lag in the form of buffered clocks and
>>only
>> allowing ³wake up² style signals every 50ms or something strange like
>> that.  If it was a clock buffering problem, I would expect to see a wide
>> distribution of delays starting with ZERO and ranging up to whatever the
>> clock granularity problem is.   However, every request gets a minimum
>>of a
>> 10ms delay.
>>
>> There are bunches of timings between 11.9 and 12.3ms, and a bunch
>>between
>> 20.78 and 23.7 and then next batch is at 31 to 34 ms, etc... with
>>clusters
>> between X2.0 ms and X4.0ms for each request, where X is a random tens
>> place digit.  This suggests a 10ms granularity in a timer or a reference
>> clock.  Maybe VMWare or CentOS only supports 10ms timer granularity when
>> something doesn't get immediately pulled off the queue?
>>
>> Another thought I had was turning off chunking mode...  maybe ATS thinks
>> we are not done forming our responses or queries, and is buffering up
>> everything until a timeout tells it to go ahead and to dump
>>³incompletely
>> formed stream data² as chunk buffered I/O in the ³OK, X milliseconds
>>have
>> passed, just dump your buffers because there is a delay getting the next
>> chunk out² paradigm?  Are these 10,20,30,40,50,60,70ms delays the
>>products
>> of waiting for a chunk buffering timeout?  Turning off chunking mode in
>> records.config dropped the average latency to 36 milliseconds from 42.
>>I
>> would think it would make the problem go completely away or not do
>> anything at all.  Mystifying.
>>
>> There are some spots in the code where we call out to TSContCall (contp,
>> TS_EVENT_VCONN_EOS, data); to tell it that a READ is done so that ATS
>> wakes up and continues.  That reduced an annoying latency from 80ms to
>> 15ms.
>>
>> I tried putting in:
>>    TSContCall(TSVIOContGet(mRequestVio), TS_EVENT_VCONN_WRITE_COMPLETE,
>> mRequestVio);
>> after one call to:
>>    mRequestVio = TSVConnWrite(mVc, httpCont, mServerRequestBufferReader,
>> content_length);
>> This didn't do anything at all to the latencies.  Maybe this is the
>>wrong
>> call?
>>
>> I'm rather stumped.  Does anyone there have ideas?  Is the answer ³ATS
>>was
>> not designed to run on VM's²?
>>
>> Thank you for reading this far,
>>
>> James Whitaker
>>
>>

Reply via email to