On Wed, 29 Sep 2004, Kazuho Oku wrote:

>From: "Christian Smith" <[EMAIL PROTECTED]>
>> On Tue, 28 Sep 2004, Kazuho Oku wrote:
>
>> >Unfortunately, my apache module only performs a single SELECT clause of
>> >which WHERE clause can be indexed.
>> >What I am wondering is the way to stop calling SQLite each time the module
>> >processes an HTTP request (eliminate the FLOCK -> READ -> FUNLOCK done by
>> >SQLite).
>>
>> Is this an actual bottleneck? Against internet latencies, this is likely
>> not to be that big a win.
>>
>> It will not be significantly slower than any other filesystem operation,
>> such as checking for an indicator file, especially if you then remove the
>> indicator file (synchronous disk operations!)
>
>It's not the latency that is the problem.
>The problem is the performance decrease caused by the module I am
>developing.
>
>When apache (without my module) sends a static file, the disk-related system
>calls that it issues are: stat(), open(), mmap(), munmap(), write(), once
>for each.
>If my module did not cache database information, then it would issue flock()
>twice, and multiple seek()s and read()s.
>Although I have not benchmarked, my understanding is that this would be of
>(at least) some performance penalty.


There is going to be some performance penalty, but if it's only 1% of your
request time, you'll have complicated your design for very little
performance gain.

Always perform benchmarks to back up your assumptions. I'm not saying your
assumptions are necessarily wrong, but just that you should back up your
claims before complicating your design to work round a bottleneck which
may not be there.

AFAIK, flock() is a kernel operation, and seek() and read() should not
touch the disk if the data blocks required are already cached, and so all
should be relatively cheap operations (against disk IO and network
latencies.)

As I said, anything in the filesystem you use to communicate cache updates
will require synchronous IO on the writer (create the file) and the reader
(delete the file once cache is updated.) Synchronous IO will likely kill
performance more than cached reads and seeks.


>This is the reason why I think caching is necessary.
>
>> Does the cache have to be in sync with the current database all the time.
>> ie. Would it be satisfactory to simply check the database once per second,
>> and only update the cache then? This would certainly reduce any SQLite
>> overhead significantly.
>
>Since apache is a multiprocess server, I do not think rereading the database
>periodically is a good idea.
>Typical usage would be to update the database through a CGI and then access
>through the updated database from a different HTTP connection.  So I cannot
>make the interval between rereads long.

How often is the reader being hit? Approximately?

How often does the writer write?

Is the reader also a CGI binary? If so, the fork/exec will likely be a big
performance drain.

Christian


-- 
    /"\
    \ /    ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
     X                           - AGAINST MS ATTACHMENTS
    / \

Reply via email to