I've noticed a performance issue on a large site that makes heavy use of txt
RewriteMaps.  I'd like to propose an alternative implementation of txt maps
to deal with the issue.

This is how the current implementation works:

  - lookup_map() is responsible for extracting data from maps

  - For a txt map, it first attempts to look up the key in the in-memory
    cache, using get_cache_value()

  - get_cache_value() acquires a mutex M before looking in the cache.

  - If get_cache_value() returned a value, lookup_map() returns that.

  - Otherwise, lookup_map() will:

      - Call lookup_map_txtfile() to find the value

      - Store the value in the cache (or an empty string, if no value was
        found) by calling set_cache_value()

  - set_cache_value() acquires M.  It then stores the value in the cache,
    clearing the cache first if there's an existing cache which is outdated.

  - lookup_map_txtfile() opens the map file, and does a linear search on its
    contents to find the desired value.

The behaviour of this implementation can cause problems at server startup.
This is particularly apparent on a busy server that makes substantial use of
txt RewriteMaps.

At server startup, no map data has been cached.  Suppose we're using the
worker MPM, and that thread T1 in process P1 receives a request which
requires use of a map.  This causes lookup_map_txtfile() to read the value
out of the file, taking time linear in the length of the file.  (It's
possible for the server administrator to optimise this by ordering the map
so that more probable items are earlier in the file; but that's rather
beside the point, and isn't documented anyway.)

Meanwhile, thread T2 in process P1 also needs a value from the map.  T1 and
T2 serialize their attempts to use the cache -- even just to determine
whether the value is present and up-to-date.  This is already potentially
problematic.  The more threads you have per process, the more likely it is
that threads will contend for the mutex protecting a given map's cache.

Worse, T1 and T2 will each open the file, and scan through it for the values
they're interested in.  For a sufficiently long-lived server process, we
will ultimately reach a steady state in which all the data from all maps is
present in the memory cache.  But reaching that steady state involves one
linear-time scan through each file for each line it contains.  That is, the
time needed to cache the data in a single map file is quadratic in the
number of lines in that file.

It's also possible for multiple threads to simultaneously read the same
value from a given file.  While the mutex used by get_cache_value() and
set_cache_value() ensures that the cache isn't damaged by concurrent
accesses, it can happen that two threads will set a given key to the same
cached value immediately after each other:

  T1 checks for the existence of key K, and finds it absent
  T2 checks for the existence of K, and finds it absent
  T1 reads K's value V from the file
  T2 reads V from the file
  T1 stores V in the cache
  T2 stores the same V in the cache

That doesn't affect the asymptotic complexity, but it's obviously wasted
effort.

Beyond a single process, note that during server startup, all processes are
doing this simultaneously.

This is an enormous amount of I/O to be doing merely to build an in-memory
hash table of the contents of some text files.  If you have enough maps in
use and a map-lookup workload which is sufficiently high, server startup can
be enormously costly.  We have observed machines demonstrating the symptoms
of thrash death during graceful restart, apparently because of this effect.

I propose the following alternative implementation:

  - On server startup, read all txt (and rnd) maps into memory in their
    entirety, recording the mtime of the map file.  This should be done in
    post_config, so that the cached data is available to all child
    processes.

  - On map lookup:

      - Determine the mtime of the map file

      - If the cache is up-to-date:

          - Acquire a thread rwlock L for this file's cache, in shared mode

          - Read the value from the cache

          - Release L

      - If the cache is outdated, attempt to acquire a thread mutex M for this
        file, without blocking

      - If M was acquired, it's this thread's responsibility to refresh the
        cache:

          - Read the entire file into a new hash

          - Acquire L in exclusive mode

          - Replace the existing cache with the hash just read

          - Release L

          - Release M

      - If M wasn't acquired, some other thread must be busy refreshing the
        cache for this file, so read from the existing cache as if it were
        up-to-date, using L as normal

I have convinced myself that this scheme is thread-safe, and that where
possible it avoids serializing threads' access to the cache.  It should
scale well to servers with long-lived processes that use many large maps: it
performs exactly the least possible amount of I/O, and its memory usage
matches the steady-state memory usage of the current implementation.

This scheme may impair performance in the following situations:

  - Your map files change very frequently.  (I believe this situation is
    extremely unlikely.)

  - You have very many and/or very large map files, and your processes are
    typically short-lived (low MaxRequestsPerChild, for example).  This
    seems more likely than highly-volatile map files, but I'd be inclined to
    suggest to administrators in this situation that they should use dbm
    maps instead: dbm maps can be presumed to have O(log n) lookup time,
    rather than the O(n) of txt maps.

I'd welcome any feedback anyone could offer on this proposal.  I'm happy to
do the work myself; is there a reasonable chance something along these lines
would be accepted if I do?  If not, what changes would be needed to make
this suitable for httpd?

-- 
Aaron Crane

Reply via email to