cvs commit: modperl-2.0/pod .cvsignore modperl_2.0.pod modperl_design.pod

stas Thu, 27 Dec 2001 04:00:58 -0800
stas        01/12/27 04:02:47

  Added:       src/docs/2.0/user/overview overview.pod
               src/docs/2.0/user/design design.pod
  Removed:     pod      .cvsignore modperl_2.0.pod modperl_design.pod
  Log:
  - move docs from ./pod to docs/user/<appr dir>/
  - ./pod dir is a goner, all docs are now in the modperl-docs rep
  
  Revision  Changes    Path
  1.1                  modperl-docs/src/docs/2.0/user/overview/overview.pod
  
  Index: overview.pod
  ===================================================================
  =head1 NAME
  
  Overview of mod_perl 2.0
  
  =head1 Introduction
  
  mod_perl was introduced in early 1996, both Perl and Apache have
  changed a great deal since that time. mod_perl has adjusted to both
  along the way over the past 4 and a half years or so using the same
  code base.  Over this course of time, the mod_perl sources have become
  more and more difficult to maintain, in large part to provide
  compatibility between the many different flavors of Apache and Perl.
  And, compatibility across these versions and flavors is a more
  diffcult goal for mod_perl to reach that a typical Apache or Perl
  module, since mod_perl reaches a bit deeper into the corners of Apache
  and Perl internals than most.  Discussions of the idea to rewrite
  mod_perl as version 2.0 started in 1998, but never made it much further
  than an idea.  When Apache 2.0 development was underway it became
  clear that a rewrite of mod_perl would be required to adjust to the
  new Apache architechure and API.
  
  Of the many changes happening in Apache 2.0, the one which has the
  most impact on mod_perl is the introduction of threads to the overall
  design.  Threads have been a part of Apache on the win32 side since
  the Apache port was introduced.  The mod_perl port to win32 happened
  in verison 1.00b1, released in June of 1997.  This port enabled
  mod_perl to compile and run in a threaded windows environment, with
  one major caveat: only one concurrent mod_perl request could be
  handled at any given time.  This was due to the fact that Perl did not 
  introduce thread safe interpreters until version 5.6.0, released in
  March of 2000.  Contrary to popular belief, the "thread support"
  implemented in Perl 5.005 (released July 1998), did not make Perl
  thread safe internally.  Well before that version, Perl had the notion 
  of "Multiplicity", which allowed multiple interpreter instances in the 
  same process.  However, these instances were not thread safe, that is, 
  concurrent callbacks into multiple interpreters were not supported.
  
  It just so happens that the release of Perl 5.6.0 was nearly at the
  same time as the first alpha version of Apache 2.0.  The development
  of mod_perl 2.0 was underway before those releases, but as both Perl
  5.6.x and Apache 2.0 are reaching stability, mod_perl-2.0 becomes more 
  of a reality.  In addition to the adjustments for threads and Apache
  2.0 API changes, this rewrite of mod_perl is an opportunity to clean
  up the source tree.  This includes both removing the old backward
  compatibility bandaids and building a smarter, stronger and faster
  implementation based on lessons learned over the 4.5 years since
  mod_perl was introduced.
  
  This paper and talk assume basic knowlege of mod_perl 1.xx features
  and will focus only the differences mod_perl-2.00 will bring.
  
  Note 1: The Apache and mod_perl APIs mentioned in this paper are both in
  an "alpha" state and subject to change.
  
  Note 2: Some of the mod_perl APIs mentioned in this paper do not even
  exist and are subject to be implemented, in which case you would be
  redirected to "Note 1".
  
  =head1 Apache 2.0 Summary
  
  Note: This section will give you a brief overview of the changes in
  Apache 2.0, just enough to understand where mod_perl will fit in.  For
  more details on Apache 2.0 consult the papers by Ryan Bloom.
  
  =head2 MPMs - Multi-Processing Model Modules
  
  In Apache 1.3.x concurrent requests were handled by multiple
  processes, and the logic to manage these processes lived in one place,
  I<http_main.c>, 7200 some odd lines of code.  If Apache 1.3.x is
  compiled on a Win32 system large parts of this source file are
  redefined to handle requests using threads.  Now suppose you want to
  change the way Apache 1.3.x processes requests, say, into a DCE RPC
  listener.  This is possible only by slicing and dicing I<http_main.c>
  into more pieces or by redefining the I<standalone_main> function,
  with a C<-DSTANDALONE_MAIN=your_function> compile time flag.
  Neither of which is a clean, modular mechanism.
  
  Apache-2.0 solves this problem by intoducing I<Multi Processing Model
  modules>, better known as I<MPMs>.  The task of managing incoming
  requests is left to the MPMs, shrinking I<http_main.c> to less than
  500 lines of code.  Several MPMs are included with Apache 2.0 in the
  I<src/modules/mpm> directory:
  
  =over 4
  
  =item prefork
  
  The I<prefork> module emulates 1.3.x's preforking model, where each
  request is handled by a different process.
  
  =item threaded
  
  This MPMs implements a hybrid multi-process multi-threaded
  approach based on the I<pthreads> standard.
  
  =item os2/winnt/beos
  
  These MPMs also implement the hybrid multi-process/multi-threaded
  model, with each based on native OS thread implementations.
  
  =item perchild
  
  The I<perchild> MPM is similar to the I<threaded> MPM, but is extended
  with a mechanism which allows mapping of requests to virtual hosts to
  a process running under the user id and group configured for that host.
  This provides a robust replacement for the I<suexec> mechanism.
  
  =back
  
  =head2 APR - Apache Portable Runtime
  
  Apache 1.3.x has been ported to a very large number of platforms
  including various flavors of unix, win32, os/2, the list goes on.
  However, in 1.3.x there was no clear-cut, pre-designed portability
  layer for third-party modules to take advantage of.  APR provides this 
  API layer in a very clean way.  For mod_perl, APR will assist a great
  deal with portability.  Combined with the portablity of Perl, mod_perl-2.0
  needs only to implement a portable build system, the rest comes "for free".
  A Perl interface will be provided for certain areas of APR, such as
  the shared memory abstraction, but the majority of APR will be used by 
  mod_perl "under the covers".
  
  =head2 New Hook Scheme
  
  In Apache 1.3, modules were registered using the I<module> structure,
  normally static to I<mod_foo.c>.  This structure contains pointers to
  the command table, config create/merge functions, response handler
  table and function pointers for all of the other hooks, such as
  I<child_init> and I<check_user_id>.  In 2.0, this structure has been
  pruned down to the first three items mention and a new function
  pointer added called I<register_hooks>.  It is the job of
  I<register_hooks> to register functions for all other hooks (such as
  I<child_init> and I<check_user_id>).  Not only is hook registration
  now dynamic, it is also possible for modules to register more than one 
  function per hook, unlike 1.3.  The new hook mechanism also makes it
  possible to sort registered functions, unlike 1.3 with function
  pointers hardwired into the module structure, and each module
  structure into a linked list.  Order in 1.3 depended on this list,
  which was possible to order using compile-time and configuration-time
  configuration, but that was left to the user.  Whereas in 2.0, the
  add_hook functions accept an order preference parameter, those
  commonly used are:
  
  =over 4
  
  =item FIRST
  
  =item MIDDLE
  
  =item LAST
  
  =back
  
  For mod_perl, dynamic registration provides a cleaner way to bypass the
  I<Perl*Handler> configuration.  By simply adding this configuration:
  
   PerlModule Apache::Foo
  
  I<Apache/Foo.pm> can register hooks itself at server startup:
  
   Apache::Hook->add(PerlAuthenHandler => \&authenticate, Apache::Hook::MIDDLE);
   Apache::Hook->add(PerlLogHandler => \&logger, Apache::Hook::LAST);
  
  However, this means that Perl subroutines registered via this
  mechanism will be called for *every* request.  It will be left to that 
  subroutine to decide if it was to handle or decline the given phase.
  As there is overhead in entering the Perl runtime, it will most likely 
  be to your advantage to continue using I<Perl*Handler> configuration
  to reduce this overhead.  If it is the case that your I<Perl*Handler>
  should be invoked for every request, the hook registration mechanism
  will save some configuration keystrokes.
  
  =head2 Configuration Tree
  
  When configuration files are read by Apache 1.3, it hands off the
  parsed text to module configuration directive handlers and discards
  that text afterwards.  With Apache 2.0, the configuration files are
  first parsed into a tree structure, which is then walked to pass data
  down to the modules.  This tree is then left in memory with an API for 
  accessing it at request time.  The tree can be quite useful for other
  modules.  For example, in 1.3, mod_info has it's own configuration
  parser and parses the configuration files each time you access it.
  With 2.0 there is already a parse tree in memory, which mod_info can
  then walk to output it's information.
  
  If a mod_perl 1.xx module wants access to configuration information,
  there are two approaches.  A module can "subclass" directive handlers, 
  saving a copy of the data for itself, then returning B<DECLINE_CMD> so 
  the other modules are also handed the info.  Or, the
  C<$Apache::Server::SaveConfig> variable can be set to save <Perl>
  configuration in the C<%Apache::ReadConfig::> namespace.  Both methods 
  are rather kludgy, version 2.0 will provide a Perl interface to the
  Apache configuration tree.
  
  =head2 Filtering
  
  Filtering of Perl modules output has been possible for years since
  tied filehandle support was added to Perl.  There are several modules, 
  such as I<Apache::Filter> and I<Apache::OutputChain> which have been
  written to provide mechanisms for filtering the C<STDOUT> "stream".
  There are several of these modules because no one approach has quite
  been able to offer the ease of use one would expect, which is due
  simply to limitations of the Perl tied filehandle design.  Another
  problem is that these filters can only filter the output of other Perl
  modules. C modules in Apache 1.3 send data directly to the client and
  there is no clean way to capture this stream.  Apache 2.0 has solved
  this problem by introducing a filtering API.  With the baseline i/o
  stream tied to this filter mechansim, any module can filter the output
  of any other module, with any number of filters in between.
  
  =head2 Protocol Modules
  
  Apache 1.3 is hardwired to speak only one protocol, HTTP.  Apache 2.0
  has moved to more of a "server framework" architecture making it
  possible to plugin handlers for protocols other than HTTP.  The
  protocol module design also abstracts the transport layer so protocols 
  such as SSL can be hooked into the server without requiring
  modifications to the Apache source code.  This allows Apache to be
  extended much further than in the past, making it possible to add
  support for protocols such as FTP, SMTP, RPC flavors and the like.
  The main advantage being that protocol plugins can take advantage of
  Apache's portability, process/thread management, configuration
  mechanism and plugin API.
  
  =head1 mod_perl and Threaded MPMs
  
  =head2 Perl 5.6
  
  Thread safe Perl interpreters, also known as "ithreads" (Intepreter
  Threads) provide the mechanism need for mod_perl to adapt to the
  Apache 2.0 thread architecture.  This mechanism is a compile time
  option which encapsulates the Perl runtime inside of a single
  I<PerlInterpreter> structure.  With each interpreter instance
  containing its own symbol tables, stacks and other Perl runtime
  mechanisms, it is possible for any number of threads in the same
  process to concurrently callback into Perl.  This of course requires
  each thread to have it's own I<PerlInterpreter> object, or at least
  that each instance is only access by one thread at any given time.
  
  mod_perl-1.xx has only a single I<PerlInterpreter>, which is
  contructed by the parent process, then inherited across the forks to
  child processes.  mod_perl-2.0 has a configurable number of
  I<PerlInterpreters> and two classes of interpreters, I<parent> and
  I<clone>.  A I<parent> is like that in 1.xx, the main interpreter
  created at startup time which compiles any pre-loaded Perl code.
  A I<clone> is created from the parent using the Perl API
  I<perl_clone()> function.  At request time, I<parent> interpreters are 
  only used for making more I<clones>, as they are the interpreters
  which actually handle requests.  Care is taken by Perl to copy only
  mutable data, which means that no runtime locking is required and
  read-only data such as the syntax tree is shared from the I<parent>.
  
  =head2 New mod_perl Directives for Threaded MPMs
  
  Rather than create a I<PerlInterperter> per-thread by default,
  mod_perl creates a pool of interpreters.  The pool mechanism helps cut 
  down memory usage a great deal.  As already mentioned, the syntax tree 
  is shared between all cloned interpreters.  If your server is serving
  more than mod_perl requests, having a smaller number of
  PerlInterpreters than the number of threads will clearly cut down on
  memory usage.  Finally and perhaps the biggest win is memory reuse.
  That is, as calls are made into Perl subroutines, memory allocations
  are made for variables when they are used for the first time.
  Subsequent use of variables may allocate more memory, e.g. if the
  string needs to hold a larger than it did before, or an array more
  elements than in the past.  As an optimization, Perl hangs onto these
  allocations, even though their values "go out of scope".  With the
  1.xx model, random children would be hit with these allocations.  With 
  2.0, mod_perl has much better control over which PerlInterpreters are
  used for incoming requests.  The intepreters are stored in two linked
  lists, one for available interpreters one for busy.  When needed to
  handle a request, one is taken from the head of the available list and
  put back into the head of the list when done.  This means if you have,
  say, 10 interpreters configured to be cloned at startup time, but no
  more than 5 are ever used concurrently, those 5 continue to reuse
  Perls allocations, while the other 5 remain much smaller, but ready to 
  go if the need arises.
  
  Various attributes of the pools are configurable with the following
  configuration directives:
  
  =over 4
  
  =item PerlInterpStart
  
  The number of intepreters to clone at startup time.
  
  =item PerlInterpMax
  
  If all running interpreters are in use, mod_perl will clone new
  interpreters to handle the request, up until this number of
  interpreters is reached. When Max is reached, mod_perl will block
  until one becomes available.
  
  =item PerlInterpMinSpare
  
  The minimum number of available interpreters this parameter will clone
  interpreters up to Max, before a request comes in.
  
  =item PerlInterpMaxSpare
  
  mod_perl will throttle down the number of interpreters to this number
  as those in use become available.
  
  =item PerlInterpMaxRequests
  
  The maximum number of requests an interpreter should serve, the
  interpreter is destroyed when the number is reached and replaced with
  a fresh clone.
  
  =item PerlInterpScope
  
  As mentioned, when a request in a threaded mpm is handled by mod_perl,
  an interpreter must be pulled from the interpreter pool.  The
  interpreter is then only available to the thread that selected it,
  until it is released back into the interpreter pool.
  By default, an interpreter will be held for the lifetime of the
  request, equivalent to this configuration:
  
   PerlInterpScope request
  
  For example, if a PerlAccessHandler is configured, an interpreter will
  selected before it is run and not released until after the logging
  phase.
  
  Intepreters will be shared across subrequests by default, however, it
  is possible configure the intepreter scope to be per-subrequest on
  a per-directory basis:
  
   PerlInterpScope subrequest
  
  With this configuration, an autoindex generated page for example would 
  select an interpreter for each item in the listing that is configured
  with a Perl*Handler.
  
  It is also possible to configure the scope to be per-handler:
  
   PerlInterpScope handler
  
  With this configuration, an interpreter will be selected before
  PerlAccessHandlers are run, and putback immediately afterwards, before
  Apache moves onto the authentication phase.  If a PerlFixupHandler is
  configured further down the chain, another interpreter will be
  selected and again putback afterwards, before PerlResponseHandler is
  run.
  
  For protocol handlers, the interpreter is held for the lifetime of the
  connection.  However, a C protocol module might hook into mod_perl
  (e.g. mod_ftp) and provide a request_rec.  In this case, the default
  scope is that of the request.  Should a mod_perl handler want to
  maintain state for the lifetime of an ftp connection, it is possible
  to do so on a per-virtualhost basis:
  
   PerlInterpScope connection
  
  =back
  
  =head2 Issues with Threading
  
  The Perl "ithreads" implementation ensures that Perl code is thread
  safe, at least with respect to the Apache threads in which it is
  running.  However, it does not ensure that extensions which call into
  third-party C/C++ libraries are thread safe.  In the case of
  non-threadsafe extensions, if it is not possible to fix those
  routines, care will need to be taken to serialize calls into such
  functions (either at the xs or Perl level).
  
  Another issue is that "global" variables are only global to the
  interpreter in which they are created.  Some research has been done on
  the concept of I<solar> variables which are global across all
  interpreter instances.  It has not been decided if this feature would
  best fit built into the Perl core or as an extension, but fear not,
  the feature will be provided in one form or another.
  
  =head1 Thread Item Pool API
  
  As we discussed, mod_perl implements a pool mechanism to manage
  I<PerlInterpreters> between threads.  This mechanism has been
  abstracted into an API known as "tipool", I<Thread Item Pool>.  This
  pool can be used to manage any data structure, in which you wish to
  have a smaller number than the number of configured threads.  A good
  example of such a data structure is a database connection handle.
  The I<Apache::DBI> module implements persisent connections for 1.xx,
  but may result in each child maintaining its own connection, when it
  is most often the case that number of connections is never needed
  concurrently.  The TIPool API provides a mechanism to solve this
  problem, consisting of the following methods:
  
  =over 4
  
  =item new
  
  Create a new thread item pool.  This constructor is passed an
  I<Apache::Pool> object, a hash reference to pool configuration parameters,
  a hash reference to pool callbacks and an optional userdata variable
  which is passed to callbacks:
  
   my $tip = Apache::TIPool->new($p,
                                 {Start => 3, Max => 6},
                                 {grow => \&new_connection,
                                  shrink => \&close_connection},
                                 \%my_config);
  
  The configuration parameters, I<Start>, I<Max>, I<MinSpare>, I<MaxSpare>
  and I<MaxRequests> configure the pool for your items, just as the
  I<PerlInterp*> directives do for I<PerlInterpreters>.
  
  The I<grow> callback is called to create new items to be added to the
  pool, I<shrink> is called when an item is removed from the pool.
  
  
  =item pop
  
  This method will return an item from the pool, from the head of the
  available list.  If the current number of items are all busy, and that
  number is less than the configured maximum, a new item will be created
  by calling the configured I<grow> callback.  Otherwise, the I<pop>
  method will block until an item is available.
  
   my $item = $tip->pop;
  
  =item putback
  
  This method gives an item (returned from I<pop>) back to the pool,
  which is pushed into the head of the available list:
  
   $tip->putback($item);
  
  =back
  
  Future improvements will be made to the TIPool API, such as the
  ability to sort the I<available> and I<busy> lists and specify if
  items should be popped and putback to/from the head or tail of the
  list.
  
  =head2 Apache::DBIPool
  
  Now we will take a look at how to make I<DBI> take advantage of
  I<TIPool> API with the I<Apache::DBIPool> module.  The module
  configuration in httpd.conf will look something like so:
  
   PerlModule Apache::DBIPool
  
   <DBIPool dbi:mysql:db_name>
     DBIPoolStart 10
     DBIPoolMax   20
     DBIPoolMaxSpare 10
     DBIPoolMinSpare 5
     DBIUserName dougm
     DBIPassWord XxXx
   </DBIPool>
  
  The module is loaded using the I<PerlModule> directive just as with
  other modules.  TIPools are then configured using I<DBIPool>
  configuration sections.  The argument given to the container is the
  I<dsn> and within are the pool directives I<Start>, I<Max>,
  I<MaxSpare> and I<MinSpare>.  The I<UserName> and I<PassWord>
  directives will be passed to the I<DBI> I<connect> method.
  There can be any number of I<DBIPool> containers, provided each I<dsn> 
  is different, and/or each container is inside a different
  I<VirtualHost> container.
  
  Now let's examine the source code, keeping in mind this module
  contains the basics and the official release (tbd) will likely contain 
  more details, such as how it hooks into I<DBI.pm> to provide
  transparency the way I<Apache::DBI> currently does.
  
  After pulling in the modules needed I<Apache::TIPool>,
  I<Apache::ModuleConfig> and I<DBI>, we setup a callback table.  The 
  I<new_connection> function will be called with the TIP needs to add a
  new item and I<close_connection> when an item is being removed from
  the pool.  The I<Apache::Hook> I<add> method registers a
  I<PerlPostConfigHandler> which will be called after Apache has read
  the configuration files.
  
  This handler (our I<init> function) is passed 3 I<Apache::Pool>
  objects and one I<Apache::Server> object.  Each I<Apache::Pool> has a
  different lifetime, the first will be alive until configuration is
  read again, such as during restarts.  The second will be alive until
  logs are re-opened and the third is a temporary pool which is cleared
  before Apache starts serving requests.  Since the DBI connection pool
  is associated with configuration in httpd.conf, we will use that pool.  
  
  The I<Apache::ModuleConfig> I<get> method is called with the
  I<Apache::Server> object to give us the configuration associated with
  the given server.  Next is a while loop which iterates over the
  configuration parsed by the I<DBIPool> directive handler.  The keys of
  this hash are the configured I<dsn>, of which there is one per
  I<DBIPool> configuration section.  The values will be a hash reference
  to the pool configuration, I<Start>, I<Max>, I<MinSpare>, I<MaxSpare>
  and I<MaxRequests>.
  
  A I<new> I<Apache::TIPool> is then contructed, passing it the
  C<$pconf> I<Apache::Pool>, configuration C<$params>, the I<$callbacks> 
  table and C<$conn> hash ref.  The I<TIPool> is then saved into the
  C<$cfg> object, indexed by the I<dsn>.
  
  At the time I<Apache::TIPool::new> is called, the I<new_connection>
  callback will be called the number of time to which I<Start> is
  configured.  This callback localizes I<Apache::DBIPool::connect> to a
  code reference which makes the real database connection.
  
  At request time I<Apache::DBIPool::connect> will fetch a database
  handle from the I<TIPool>.  It does so by digging into the
  configuration object associated with the current virtual host to
  obtain a reference to the I<TIPool> object.  It then calls the I<pop>
  method, which will immediatly return a database handle if one is
  available.  If all opened connection are in used and the current
  number of connections is less than the configured I<Max>, the call to
  I<pop> will result in a call to I<new_connection>.  If I<Max> has
  already been reached, then I<pop> will block until a handle is
  I<putback> into the pool.
  
  Finally, the handle is blessed into the I<Apache::DBIPool::db> class
  which will override the dbd class I<disconnect> method.  The
  overridden I<disconnect> method obtains a reference to the I<TIPool>
  object and passes it to the I<putback> method, making it available for 
  use by other threads.  Should the Perl code using this handle neglect to
  call the I<disconnect> method, the overridden I<connect> method has
  already registered a cleanup function to make sure it is I<putback>.
  
  =head2 Apache::DBIPool Source
  
   package Apache::DBIPool;
  
   use strict;
   use Apache::TIPool ();
   use Apache::ModuleConfig ();
   use DBI ();
  
   my $callbacks = {
      grow => \&new_connection,     #add new connection to the pool
      shrink => \&close_connection, #handle removed connection from pool
   };
  
   Apache::Hook->add(PerlPostConfigHandler => \&init); #called at startup
  
   sub init {
       my($pconf, $plog, $ptemp, $s) = @_;
  
       my $cfg = Apache::ModuleConfig->get($s, __PACKAGE__);
  
       #create a TIPool for each dsn
       while (my($conn, $params) = each %{ $cfg->{DBIPool} }) {
           my $tip = Apache::TIPool->new($pconf, $params, $callbacks, $conn);
           $cfg->{TIPool}->{ $conn->{dsn} } = $tip;
       }
   }
  
   sub new_connection {
       my($tip, $conn) = @_;
  
       #make actual connection to the database
       local *Apache::DBIPool::connect = sub {
           my($class, $drh) = (shift, shift);
           $drh->connect($dbname, @_);
       };
  
       return DBI->connect(@{$conn}{qw(dsn username password attr)});
   }
  
   sub close_connection {
       my($tip, $conn, $dbh) = @_;
       my $driver = (split $conn->{dsn}, ':')[1];
       my $method = join '::', 'DBD', $driver, 'db', 'disconnect';
       $dbh->$method(); #call the real disconnect method
   }
  
   my $EndToken = '</DBIPool>';
  
   #parse <DBIPool dbi:mysql:...>...
  
   sub DBIPool ($$$;*) {
       my($cfg, $parms, $dsn, $cfg_fh) = @_;
       $dsn =~ s/>$//;
  
       $cfg->{DBIPool}->{$dsn}->{dsn} = $dsn;
  
       while((my $line = <$cfg_fh>) !~ m:^$EndToken:o) {
           my($name, $value) = split $line, /\s+/, 2;
           $name =~ s/^DBIPool(\w+)/lc $1/ei;
           $cfg->{DBIPool}->{$dsn}->{$name} = $value;
       }
   }
  
   sub config {
       my $r = Apache->request;
       return Apache::ModuleConfig->get($r, __PACKAGE__);
   }
  
   #called from DBI::connect
   sub connect {
       my($class, $drh) = (shift, shift);
  
       $drh->{DSN} = join ':', 'dbi', $drh->{Name}, $_[0];
       my $cfg = config();
  
       my $tip = $cfg->{TIPool}->{ $drh->{DSN} };
  
       unless ($tip) {
           #XXX: do a real connect or fallback to Apache::DBI
       }
  
       my $item = $tip->pop; #select a connection from the pool
  
       $r->register_cleanup(sub { #incase disconnect() is not called
           $tip->putback($item);
       });
  
       return bless 'Apache::DBIPool::db', $item->data; #the dbh
   }
  
   package Apache::DBIPool::db;
  
   our @ISA = qw(DBI::db);
  
   #override disconnect, puts database handle back in the pool
   sub disconnect {
       my $dbh = shift;
       my $tip = config()->{TIPool}->{ $dbh->{DSN} };
       $tip->putback($dbh);
       1;
   }
  
   1;
   __END__
  
  =head1 PerlOptions Directive
  
  A new configuration directive to mod_perl-2.0, I<PerlOptions>,
  provides fine-grained configuration for what were compile-time only
  options in mod_perl-1.xx.  In addition, this directive provides
  control over what class of I<PerlInterpreter> is used for a
  I<VirtualHost> or location configured with I<Location>, I<Directory>, etc.
  
  These are all best explained with examples, first here's how to
  disable mod_perl for a certain host:
  
   <VirtualHost ...>
      PerlOptions -Enable
   </VirtualHost>
  
  
  Suppose a one of the hosts does not want to allow users to configure
  I<PerlAuthenHandler>, I<PerlAuthzHandler> or I<PerlAccessHandler> or
  <Perl> sections:
  
   <VirtualHost ...>
      PerlOptions -Authen -Authz -Access -Sections
   </VirtualHost>
  
  Or maybe everything but the response handler:
  
   <VirtualHost ...>
      PerlOptions None +Response
   </VirtualHost>
  
  A common problem with mod_perl-1.xx was the shared namespace between
  all code within the process.  Consider two developers using the same
  server and each which to run a different version of a module with the
  same name.  This example will create two I<parent> Perls, one for each 
  I<VirtualHost>, each with its own namespace and pointing to a
  different paths in C<@INC>:
  
   <VirtualHost ...>
      ServerName dev1
      PerlOptions +Parent
      PerlSwitches -Mblib=/home/dev1/lib/perl
   </VirtualHost>
  
   <VirtualHost ...>
      ServerName dev2
      PerlOptions +Parent
      PerlSwitches -Mblib=/home/dev2/lib/perl
   </VirtualHost>
  
  Or even for a given location, for something like "dirty" cgi scripts:
  
   <Location /cgi-bin>
      PerlOptions +Parent
      PerlInterpMaxRequests 1
      PerlInterpStart 1
      PerlInterpMax 1
      PerlResponseHandler Apache::Registry
   </Location>
  
  Will use a fresh interpreter with its own namespace to handle each
  request.
  
  Should you wish to fine tune Interpreter pools for a given host:
  
   <VirtualHost ...>
      PerlOptions +Clone
      PerlInterpStart 2
      PerlInterpMax 2
   </VirtualHost>
  
  This might be worthwhile in the case where certain hosts have their
  own sets of large-ish modules, used only in each host.  By tuning each 
  host to have it's own pool, that host will continue to reuse the Perl
  allocations in their specific modules.
  
  In 1.x versions of mod_perl, configured Perl*Handlers which are not a
  fully qualified subroutine name are resolved at request time,
  loading the handler module from disk if needed.  In 2.x, configured
  Perl*Handlers are resolved at startup time.  By default, modules are
  not auto-loaded during startup-time resolution.  It is possible to
  configure this feature with:
  
   PerlOptions +Autoload
  
  Consider this configuration:
  
   PerlResponseHandler Apache::Magick
  
  In this case, I<Apache::Magick> is the package name, and the
  subroutine name will default to I<handler>.  If the I<Apache::Magick>
  module is not already loaded, B<PerlOptions +Autoload> will attempt to
  pull it in at startup time.
  
  =head1 Integration with 2.0 Filtering
  
  The mod_perl-2.0 interface to the Apache filter API is much simpler
  than the C API, hiding most of the details underneath.  Perl filters
  are configured using the I<PerlFilterHandler> directive, for example:
  
   PerlFilterHandler Apache::ReverseFilter
  
  This simply registers the filter, which can then be turned on using
  the core I<AddOutputFilter> directive:
  
   <Location /foo>
      AddOutputFilter Apache::ReverseFilter
   </Location>
  
  The I<Apache::ReverseFilter> handler will now be called for anything
  accessed in the I</foo> url space.  The I<AddOutputFilter> directive takes
  any number of filters, for example, this configuration will first send 
  the output to I<mod_include>, which will in turn pass its output down
  to I<Apache::ReverseFilter>:
  
   AddOutputFilter INCLUDE Apache::ReverseFilter
  
  For our example, I<Apache::ReverseFilter> simply reverses all of the
  output characters and then sends them downstream.  The first argument
  to a filter handler is an I<Apache::Filter> object, which at the
  moment provides two methods I<read> and I<write>.  The I<read> method
  pulls down a chunk of the output stream into the given buffer,
  returning the length read into the buffer.  An optional size argument
  may be given to specify the maximum size to read into the buffer.  If
  omitted, an arbitrary size will fill the buffer, depending on the
  upstream filter. The I<write> method passes data down to the next
  filter.  In our case C<scalar reverse> takes advantage of Perl's
  builtins to reverse the upstream buffer:
  
   package Apache::ReverseFilter;
  
   use strict;
  
   sub handler {
       my $filter = shift;
  
       while ($filter->read(my $buffer, 1024)) {
           $filter->write(scalar reverse $buffer);
       }
  
       return Apache::OK;
   }
  
   1;
  
  =head1 Perl interface to the APR and Apache API
  
  In 1.x, the Perl interface back into the Apache API and data
  structures was done piecemeal.  As functions and structure members
  were found to be useful or new features were added to the Apache API,
  the xs code was written for them here and there.
  
  The goal for 2.0 is to generate the majority of xs code and provide
  thin wrappers were needed to make the API more Perlish.  As part of
  this goal, nearly the entire APR and Apache API, along with their
  public data structures will covered from the get-go.  Certain
  functions and structures which are considered "private" to Apache or
  otherwise un-useful to Perl will not be glued.  The API behaves just
  as it did in 1.x, sosers of the API will not notice the difference,
  other than the addition of many new methods.  And in the case of
  I<APR>, it is possible to use I<APR> modules outside of Apache, for
  example:
  
   % perl -MAPR -MAPR::UUID -le 'print APR::UUID->new->format'
   b059a4b2-d11d-b211-bc23-d644b8ce0981
  
  The mod_perl generator is a custom suite of modules specifically tuned
  for gluing Apache and allows for complete control over I<everything>,
  providing many possibilities none of I<xsubpp>, I<swig> nor
  I<Inline.pm> are designed to do.  Advantages to generating the glue
  code include:
  
  =over 4
  
  =item *
  
  Not tied tightly to xsubpp
  
  =item *
  
  Easy adjustment to Apache 2.0 API/structure changes
  
  =item *
  
  Easy adjustment to Perl changes (e.g., Perl 6)
  
  =item *
  
  Ability to "discover" hookable third-party C modules.
  
  =item *
  
  Cleanly take advantage of features in newer Perls
  
  =item *
  
  Optimizations can happen across-the-board with one-shot
  
  =item *
  
  Possible to AUTOLOAD XSUBs
  
  =item *
  
  Documentation can be generated from code
  
  =item *
  
  Code can be generated from documentation
  
  =back
  
  =head1 Protocol Modules with mod_perl-2.0
  
  =head2 Apache::Echo
  
  Apache 2.0 ships with an example protocol module, I<mod_echo>, which
  simply reads data from the client and echos it right back.  Here we'll 
  take a look at a Perl version of that module, called I<Apache::Echo>.
  A protocol handler is configured using the
  I<PerlProcessConnectionHandler> directive and we'll use the I<Listen> 
  and I<VirtualHost> directives to bind to a non-standard port B<8084>:
  
   Listen 8084
   <VirtualHost _default_:8084>
       PerlProcessConnectionHandler Apache::Echo
   </VirtualHost>
  
  Apache::Echo is then enabled when starting Apache:
  
   % httpd
  
  And we give it a whirl:
  
   % telnet localhost 8084
   Trying 127.0.0.1...
   Connected to localhost (127.0.0.1).
   Escape character is '^]'.
   hello apachecon
   hello apachecon
   ^]
  
  The code is just a few lines of code, with the standard I<package>
  declaration and of course, C<use strict;>.  As with all
  I<Perl*Handler>s, the subroutine name defaults to I<handler>.  However, 
  in the case of a protocol handler, the first argument is not a
  I<request_rec>, but a I<conn_rec> blessed into the
  I<Apache::Connection> class.  We have direct access to the client
  socket via I<Apache::Connection>'s I<client_socket> method.  This
  returns an object blessed into the I<APR::Socket> class.
  
  Inside the echo loop, we attempt to read B<BUFF_LEN> bytes from the
  client socket into the C<$buff> buffer.  The C<$rlen> parameter will
  be set to the number of bytes actually read.  The I<APR::Socket>
  I<recv> method will return an I<apr_status_t> value, be we need only
  check the read length to break out of the loop if it is less than or
  equal to B<0> bytes. If we received some data, it is immediately
  echoed back to the client with the I<APR::Socket> I<send> method.
  If we were unable to echo back the same number of bytes read from the
  client, assume the connection was dropped and break out of the loop.
  Once the client has disconnected, the module returns B<Apache::OK>,
  telling Apache we have handled the connection:
  
   package Apache::Echo;
   
   use strict;
   use Apache::Connection ();
   use APR::Socket ();
   
   use constant BUFF_LEN => 1024;
   
   sub handler {
       my Apache::Connection $c = shift;
       my APR::Socket $socket = $c->client_socket;
   
       my $buff;
   
       for (;;) {
           my($rlen, $wlen);
           my $rlen = BUFF_LEN;
           $socket->recv($buff, $rlen);
           last if $rlen <= 0;
           $wlen = $rlen;
           $socket->send($buff, $wlen);
           last if $wlen != $rlen;
       }
   
       return Apache::OK;
   }
   
   1;
   __END__
  
  =head2 Apache::CommandServer
  
  Our first protocol handler example took advange of Apache's server
  framework, but did not tap into any other modules.  The next example
  is based on the example in the "TCP Servers with IO::Socket" section
  of I<perlipc>.  Of course, we don't need I<IO::Socket> since Apache
  takes care of those details for us.  The rest of that example can
  still be used to illustrate implementing a simple text protocol.  In
  this case, one where a command is sent by the client to be executed on
  the server side, with results sent back to the client.
  
  The I<Apache::CommandServer> handler will support four commands:
  I<motd>, I<date>, I<who> and I<quit>.  These are probably not
  commands which can be exploited, but should we add such commands,
  we'll want to limit access based on ip address/hostname,
  authentication and authorization.  Protocol handlers need to take care 
  of these tasks themselves, since we bypass the HTTP protocol handler.
  
  As with all I<PerlProcessConnectionHandlers>, we are passed an
  I<Apache::Connection> object as the first argument.  Again, we will be
  directly accessing the client socket via the I<client_socket> method.
  The I<login> subroutine is called to check if access by this client
  should be allowed.  This routine makes up for what we lost with the 
  core HTTP protocol handler bypassed.  First we call the
  I<Apache::RequestRec> I<new> method, which returns a I<request_rec>
  object, just like that which is passed into request time
  I<Perl*Handlers> and returned by the subrequest API methods,
  I<lookup_uri> and I<lookup_file>.  However, this "fake request" does
  not run handlers for any of the phases, it simply returns an object
  which we can use to do that ourselves.  The C<location_merge> method
  is passed the "location" for this request, it will look up the
  <Location> section that matches the given name and merge it with the
  default server configuration.  For example, should we only wish to
  allow access to this server from certain locations:
  
      <Location Apache::CommandServer>
          deny from all
          allow from 10.*
      </Location>
  
  The I<location_merge> method only looks up and merges the
  configuration, we still need to apply it.
  This is done in I<for> loop, iterating over three methods:
  I<run_access_checker>, I<run_check_user_id> and I<run_auth_checker>.
  These methods will call directly into the Apache functions that invoke
  module handlers for these phases and will return an integer status
  code, such as B<OK>, B<DECLINED> or B<FORBIDDEN>.  If I<run_access_check>
  returns something other than B<OK> or B<DECLINED>, that status will be
  propagated up to the handler routine and then back up to Apache.
  Otherwise, the access check passed and the loop will break unless
  I<some_auth_required> returns true.  This would be false given the
  previous configuration example, but would be true in the presense of a 
  I<require> directive, such as:
  
      <Location Apache::CommandServer>
          deny from all
          allow from 10.*
          require user dougm
      </Location>
  
  Given this configuration, I<some_auth_required> will return true.
  The I<user> method is then called, which will return false if we have
  not yet authenticated.  A I<prompt> utility is called to read the
  username and password, which are then injected into the I<headers_in>
  table using the I<set_basic_credentials> method.  The I<Authenticate>
  field in this table is set to a base64 encoded value of the
  username:password pair, exactly the same format a browser would send
  for I<Basic authentication>.  Next time through the loop
  I<run_check_user_id> is called, which will in turn invoke any
  authentication handlers, such as I<mod_auth>.  When I<mod_auth> calls
  the I<ap_get_basic_auth_pw()> API function (as all Basic auth modules
  do), it will get back the username and password we injected.
  If we fail authentication a B<401> status code is returned which we
  propagate up.  Otherwise, authorization handlers are run via
  I<run_auth_checker>.  Authorization handlers normally need the I<user>
  field of the I<request_rec> for its checks and that field was filled
  in when I<mod_auth> called I<ap_get_basic_auth_pw()>.
  
  Provided login is a success, a welcome message is printed and main
  request loop entered.  Inside the loop the I<getline> function returns
  just one line of data, with newline characters stripped.  If the
  string sent by the client is in our command table, the command is then 
  invoked, otherwise a usage message is sent.  If the command does not
  return a true value, we break out of the loop.  Let's give it a try
  with this configuration:
  
   Listen 8085
   <VirtualHost _default_:8085>
       PerlProcessConnectionHandler Apache::CommandServer
  
       <Location Apache::CommandServer>
           allow from 127.0.0.1
           require user dougm
           satisfy any
           AuthUserFile /tmp/basic-auth
       </Location>
   </VirtualHost>
  
   % telnet localhost 8085
   Trying 127.0.0.1...
   Connected to localhost (127.0.0.1).
   Escape character is '^]'.
   Login: dougm
   Password: foo
   Welcome to Apache::CommandServer
   Available commands: motd date who quit
   motd
   Have a lot of fun...
   date
   Mon Mar 12 19:20:10 PST 2001
   who
   dougm    tty1     Mar 12 00:49
   dougm    pts/0    Mar 12 11:23
   dougm    pts/1    Mar 12 14:08
   dougm    pts/2    Mar 12 17:09
   quit
   Connection closed by foreign host.
  
  =head2 Apache::CommandServer Source
  
   package Apache::CommandServer;
   
   use strict;
   use Apache::Connection ();
   use APR::Socket ();
   
   my @cmds = qw(motd date who quit);
   my %commands = map { $_, \&{$_} } @cmds;
   
   sub handler {
       my Apache::Connection $c = shift;
       my APR::Socket $socket = $c->client_socket;
   
       if ((my $rc = login($c)) != Apache::OK) {
           $socket->send("Access Denied\n");
           return $rc;
       }
   
       $socket->send("Welcome to " . __PACKAGE__ .
                     "\nAvailable commands: @cmds\n");
   
       for (;;) {
           my $cmd;
           next unless $cmd = getline($socket);
   
           if (my $sub = $commands{$cmd}) {
               last unless $sub->($socket) == APR::SUCCESS;
           }
           else {
               $socket->send("Commands: @cmds\n");
           }
       }
   
       return Apache::OK;
   }
   
   sub login {
       my $c = shift;
   
       my $r = Apache::RequestRec->new($c);
       $r->location_merge(__PACKAGE__);
   
       for my $method (qw(run_access_checker run_check_user_id run_auth_checker)) {
           my $rc = $r->$method();
   
           if ($rc != Apache::OK and $rc != Apache::DECLINED) {
               return $rc;
           }
   
           last unless $r->some_auth_required;
   
           unless ($r->user) {
               my $socket = $c->client_socket;
               my $username = prompt($socket, "Login");
               my $password = prompt($socket, "Password");
   
               $r->set_basic_credentials($username, $password);
           }
       }
   
       return Apache::OK;
   }
   
   sub getline {
       my $socket = shift;
       my $line;
       $socket->recv($line, 1024);
       return unless $line;
       $line =~ s/[\r\n]*$//;
       return $line;
   }
   
   sub prompt {
       my($socket, $msg) = @_;
       $socket->send("$msg: ");
       getline($socket);
   }
   
   sub motd {
       my $socket = shift;
       open my $fh, '/etc/motd' or return;
       local $/;
       my $status = $socket->send(scalar <$fh>);
       close $fh;
       return $status;
   }
   
   sub date {
       my $socket = shift;
       $socket->send(scalar(localtime) . "\n");
   }
   
   sub who {
       my $socket = shift;
       $socket->send(scalar `who`);
   }
   
   sub quit {1}
   
   1;
   __END__
  
  =head1 mod_perl-2.0 Optimizations
  
  As mentioned in the introduction, the rewrite of mod_perl gives us the 
  chances to build a smarter, stronger and faster implementation based
  on lessons learned over the 4.5 years since mod_perl was introduced.
  There are optimizations which can be made in the mod_perl source code,
  some which can be made in the Perl space by optimizing its syntax
  tree and some a combination of both.  In this section we'll take a
  brief look at some of the optimizations that are being considered.
  
  The details of these optimizations will from the most part be hidden
  from mod_perl users, the exeception being that some will only be turned 
  on with configuration directives.  The explanation of these
  optimization ideas are best left for the live talk, a few which will
  be overviewed include:
  
  =over 4
  
  =item *
  
  "Compiled" Perl*Handlers
  
  =item *
  
  Method calls faster than subroutine calls!
  
  =item *
  
  `print' enhancements
  
  =item *
  
  Inlined Apache::*.xs calls
  
  =item *
  
  Use of Apache Pools for memory allocations
  
  =item *
  
  Copy-on-write strings
  
  =back
  
  =head1 References
  
  =over 4
  
  =item http://perl.apache.org/
  
  The mod_perl homepage will announce mod_perl-2.0 developments as they
  become available.
  
  =back
  
  =head1 Maintainers
  
  Maintainer is the person(s) you should contact with updates,
  corrections and patches.
  
  Doug MacEachern E<lt>dougm (at) covalent.netE<gt>
  
  =head1 Authors
  
  =over 
  
  =item * Doug MacEachern E<lt>dougm (at) covalent.netE<gt>
  
  =back
  
  =cut
  
  
  
  1.1                  modperl-docs/src/docs/2.0/user/design/design.pod
  
  Index: design.pod
  ===================================================================
  =head1 NAME
  
  mod_perl_design - notes on the design and goals of mod_perl-2.0
  
  =head1 SYNOPSIS
  
   perldoc mod_perl_design
  
  =head1 DESCRIPTION
  
  notes on the design and goals of mod_perl-2.0
  
  =head1 Introduction
  
  In version 2.0 of mod_perl, the basic concept of 1.x still applies:
  
   Provide complete access to the Apache C API via the Perl programming language.
  
  Rather than "porting" mod_perl-1.x to Apache 2.0, mod_perl-2.0 is
  being implemented as a complete re-write from scratch.
  
  For a more detailed introduction and functionality overview, see
  I<modperl_2.0>.
  
  =head1 Interpreter Management
  
  In order to support mod_perl in a multi-threaded environment,
  mod_perl-2.0 will take advantage of Perl's I<ithreads> feature, new to
  Perl version 5.6.0.  This feature encapsulates the Perl runtime inside
  a thread-safe I<PerlInterpreter> structure.  Each thread which needs
  to serve a mod_perl request will need its own I<PerlInterpreter>
  instance.
  
  Rather than create a one-to-one mapping of I<PerlInterpreter>
  per-thread, a configurable pool of interpreters is managed by mod_perl.
  This approach will cut down on memory usage simply by maintaining a
  minimal number of intepreters.  It will also allow re-use of
  allocations made within each interpreter by recycling those which have
  already been used.  This was not possible in the 1.3.x model, where
  each child has its own interpreter and no control over which child
  Apache dispatches the request to.
  
  The interpreter pool is only enabled if Perl is built with -Dusethreads
  otherwise, mod_perl will behave just as 1.xx, using a single
  interpreter, which is only useful when Apache is configured with the
  prefork mpm.
  
  When the server is started, a Perl interpreter is constructed, compiling 
  any code specified in the configuration, just as 1.xx does.  This
  interpreter is referred to as the "parent" interpreter.  Then, for 
  the number of I<PerlInterpStart> configured, a (thread-safe) clone of the
  parent interpreter is made (via perl_clone()) and added to the pool of
  interpreters.  This clone copies any writeable data (e.g. the symbol
  table) and shares the compiled syntax tree.  From my measurements of a 
  startup.pl including a few random modules:
  
   use CGI ();
   use POSIX ();
   use IO ();
   use SelfLoader ();
   use AutoLoader ();
   use B::Deparse ();
   use B::Terse ();
   use B ();
   use B::C ();
  
  The parent adds 6M size to the process, each clone adds less than half 
  that size, ~2.3M, thanks to the shared syntax tree.  
  
  NOTE: These measurements were made prior to finding memory leaks
  related to perl_clone() in 5.6.0 and the GvSHARED optimization.
  
  At request time, If any Perl*Handlers are configured, an available
  interpreter is selected from the pool.  As there is a I<conn_rec> and
  I<request_rec> per thread, a pointer is saved in either the
  conn_rec->pool or request_rec->pool, which will be used for the
  lifetime of that request.  For handlers that are called when threads
  are not running (PerlChild{Init,Exit}Handler), the parent interpreter
  is used.  Several configuration directives control the interpreter
  pool management:
  
  =over 4
  
  =item PerlInterpStart
  
  The number of intepreters to clone at startup time.
  
  =item PerlInterpMax
  
  If all running interpreters are in use, mod_perl will clone new
  interpreters to handle the request, up until this number of
  interpreters is reached. when PerlInterpMax is reached, mod_perl will
  block (via COND_WAIT()) until one becomes available (signaled via
  COND_SIGNAL())
  
  =item PerlInterpMinSpare
  
  The minimum number of available interpreters this parameter will clone
  interpreters up to PerlInterpMax, before a request comes in.
  
  =item PerlInterpMaxSpare
  
  mod_perl will throttle down the number of interpreters to this number
  as those in use become available
  
  =item PerlInterpMaxRequests
  
  The maximum number of requests an interpreter should serve, the
  interpreter is destroyed when the number is reached and replaced with
  a fresh one.
  
  =back
  
  =head2 TIPool
  
  The interpreter pool is implemented in terms of a "TIPool" (Thread
  Item Pool), a generic api which can be reused for other data such as
  database connections.  A Perl interface will be provided for the
  I<TIPool> mechanism, which, for example, will make it possible to
  share a pool of DBI connections.
  
  =head2 Virtual Hosts
  
  The interpreter management has been implemented in a way such that
  each VirtualHost can have its own parent Perl interpreter and/or MIP
  (Mod_perl Interpreter Pool).
  It is also possible to disable mod_perl for a given virtual host.
  
  =head2 Further Enhancements
  
  =over 4
  
  =item *
  
  The interpreter pool management could be moved into it's own thread.
  
  =item *
  
  A "garbage collector", which could also run in it's own thread,
  examining the padlists of idle interpreters and deciding to release
  and/or report large strings, array/hash sizes, etc., that Perl is
  keeping around as an optimization.
  
  =back
  
  =head1 Hook Code and Callbacks
  
  The code for hooking mod_perl in the various phases, including
  Perl*Handler directives is generated by the ModPerl::Code module.
  Access to all hooks will be provided by mod_perl in both the
  traditional Perl*Handler configuration fashion and via dynamic
  registration methods (the ap_hook_* functions).
  
  When a mod_perl hook is called for a given phase, the glue code has an 
  index into the array of handlers, so it knows to return DECLINED right 
  away if no handlers are configured, without entering the Perl runtime
  as 1.xx did.  The handlers are also now stored in an
  apr_array_header_t, which is much lighter and faster than using a
  Perl  AV, as 1.xx did.  And more importantly, keeps us out of the Perl
  runtime until we're sure we need to be there.
  
  Perl*Handlers are now "compiled", that is, the various forms of:
  
   PerlResponseHandler MyModule->handler
   # defaults to MyModule::handler or MyModule->handler
   PerlResponseHandler MyModule
   PerlResponseHandler $MyObject->handler
   PerlResponseHandler 'sub { print "foo\n" }'
  
  are only parsed once, unlike 1.xx which parsed every time the handler
  was used.  there will also be an option to parse the handlers at
  startup time.  note: this feature is currently not enabled with
  threads, as each clone needs its own copy of Perl structures.
  
  A "method handler" is now specified using the `method' sub attribute,
  e.g.
  
   sub handler : method {};
  
  instead of 1.xx's
  
   sub handler ($$) {}
  
  =head1 Perl interface to the Apache API and Data Structures
  
  In 1.x, the Perl interface back into the Apache API and data
  structures was done piecemeal.  As functions and structure members
  were found to be useful or new features were added to the Apache API,
  the xs code was written for them here and there.
  
  The goal for 2.0 is to generate the majority of xs code and provide
  thin wrappers where needed to make the API more Perlish.  As part of
  this goal, nearly the entire APR and Apache API, along with their
  public data structures will covered from the get-go.  Certain
  functions and structures which are considered "private" to Apache or
  otherwise un-useful to Perl will not be glued.
  
  The Apache header tree is parsed into Perl data structures which live
  in the generated I<Apache::FunctionTable> and
  I<Apache::StructureTable> modules.  For example, the following
  function prototype:
  
   AP_DECLARE(int) ap_meets_conditions(request_rec *r);
  
  is parsed into the following Perl structure:
  
    {
      'name' => 'ap_meets_conditions'
      'return_type' => 'int',
      'args' => [
        {
          'name' => 'r',
          'type' => 'request_rec *'
        }
      ],
    },
  
  and the following structure:
  
   typedef struct {
       uid_t uid;
       gid_t gid;
   } ap_unix_identity_t;
  
  is parsed into:
  
    {
      'type' => 'ap_unix_identity_t'
      'elts' => [
        {
          'name' => 'uid',
          'type' => 'uid_t'
        },
        {
          'name' => 'gid',
          'type' => 'gid_t'
        }
      ],
    }
  
  Similar is done for the mod_perl source tree, building
  I<ModPerl::FunctionTable> and I<ModPerl::StructureTable>.
  
  Three files are used to drive these Perl structures into the generated
  xs code:
  
  =over 4
  
  =item lib/ModPerl/function.map
  
  Specifies which functions are made available to Perl, along with which
  modules and classes they reside in.  Many functions will map directly
  to Perl, for example the following C code:
  
   static int handler (request_rec *r) {
       int rc = ap_meets_conditions(r);
       ...
  
  maps to Perl like so:
  
   sub handler {
       my $r = shift;
       my $rc = $r->meets_conditions;
   ...
  
  The function map is also used to dispatch Apache/APR functions to thin
  wrappers, rewrite arguments and rename functions which make the API
  more Perlish where applicable.  For example, C code such as:
  
   char uuid_buf[APR_UUID_FORMATTED_LENGTH+1];
   apr_uuid_t uuid;
   apr_uuid_get(&uuid)
   apr_uuid_format(uuid_buf, &uuid);
   printf("uuid=%s\n", uuid_buf);
   
  is remapped to a more Perlish convention:
  
   printf "uuid=%s\n", APR::UUID->new->format;
  
  =item lib/ModPerl/structure.map
  
  Specifies which structures and members of each are made available to
  Perl, along with which modules and classes they reside in.
  
  =item lib/ModPerl/type.map
  
  This file defines how Apache/APR types are mapped to Perl types and
  vice-versa.  For example:
  
   apr_int32_t => SvIV
   apr_int64_t => SvNV
   server_rec  => SvRV (Perl object blessed into the Apache::Server class) 
  
  =back
  
  =head2 Advantages to generating XS code
  
  =over 4
  
  =item *
  
  Not tied tightly to xsubpp
  
  =item *
  
  Easy adjustment to Apache 2.0 API/structure changes
  
  =item *
  
  Easy adjustment to Perl changes (e.g., Perl 6)
  
  =item *
  
  Ability to "discover" hookable third-party C modules.
  
  =item *
  
  Cleanly take advantage of features in newer Perls
  
  =item *
  
  Optimizations can happen across-the-board with one-shot
  
  =item *
  
  Possible to AUTOLOAD XSUBs
  
  =item *
  
  Documentation can be generated from code
  
  =item *
  
  Code can be generated from documentation
  
  =back
  
  =head2 Lvalue methods
  
  A new feature to Perl 5.6.0 is I<lvalue subroutines>, where the
  return value of a subroutine can be directly modified.  For example,
  rather than the following code to modify the uri:
  
   $r->uri($new_uri);
  
  the same result can be accomplished with the following syntax:
  
   $r->uri = $new_uri;
  
  mod_perl-2.0 will support I<lvalue subroutines> for all methods which
  access Apache and APR data structures.
  
  =head1 Filter Hooks
  
  mod_perl will provide two interfaces to filtering, a direct mapping to
  buckets and bucket brigades and a simpler, stream-oriented interface.
  
  Example of the stream oriented interface:
  
   #httpd.conf
   PerlOutputFilterHandler Apache::ReverseFilter
  
   #Apache/ReverseFilter.pm
   package Apache::ReverseFilter;
  
   use strict;
  
   sub handler {
       my $filter = shift;
  
       while ($filter->read(my $buffer, 1024)) {
           $filter->write(scalar reverse $buffer);
       }
  
       return Apache::OK;
   }
  
  =head1 Directive Handlers
  
  mod_perl 1.x provides a mechanism for Perl modules to implement
  first-class directive handlers, but requires an xs file to be
  generated and compiled.  The 2.0 version will provide the same
  functionality, but will not require the generated xs module.
  
  =head1 <Perl> Configuration Sections
  
  The ability to write configuration in Perl will carry over from 1.x,
  but will likely be implemented much different internally.  The mapping
  of a Perl symbol table should fit cleanly into the new
  I<ap_directive_t> API, unlike the hoop jumping required in 1.x.
  
  =head1 Protocol Module Support
  
  Protocol module support is provided out-of-the-box, as the hooks
  and API are covered by the generated code blankets.  Any functionality
  for assisting protocol modules should be folded back into Apache if
  possible.
  
  =head1 mod_perl MPM
  
  It will be possible to write an MPM (Multi-Processing Module) in Perl.
  mod_perl will provide a mod_perl_mpm.c framework which fits into the
  server/mpm standard convention.  The rest of the functionality needed
  to write an MPM in Perl will be covered by the generated xs code
  blanket.
  
  =head1 Build System
  
  The biggest mess in 1.xx is mod_perl's Makefile.PL, the majority of
  logic has been broken down and moved to the Apache::Build module.
  The Makefile.PL will construct an Apache::Build object which will have 
  all the info it needs to generate scripts and Makefiles that
  apache-2.0 needs.  Regardless of what that scheme may be or change to, 
  it will be easy to adapt to with build logic/variables/etc., divorced
  from the actual Makefiles and configure scripts.  In fact, the new
  build will stay as far away from the Apache build system as possible.
  The module library (libmodperl.so or libmodperl.a) is built with as
  little help from Apache as possible, using only the B<INCLUDEDIR>
  provided by I<apxs>.
  
  The new build system will also "discover" XS modules, rather than
  hard-coding the XS module names.  This allows for switchabilty between
  static and dynamic builds, no matter where the xs modules live in the
  source tree.  This also allows for third-party xs modules to be
  unpacked inside the mod_perl tree and built static without
  modification the mod_perl Makefiles.
  
  For platforms such as Win32, the build files will be generated
  similar to how unix-flavor Makefiles are.
  
  =head1 Test Framework
  
  Similar to 1.x, mod_perl-2.0 will provide a 'make test' target to
  exercise as many areas of the API and module features as possible.
  
  The test framework in 1.x, like several other areas of mod_perl, was
  cobbled together over the years.  The goal of 2.0 is to provide a
  test framework that will be usable not only for mod_perl, but for
  third-party Apache::* modules and Apache itself.
  
  =head1 CGI Emulation
  
  As a side-effect of embedding Perl inside Apache and caching
  compiled code, mod_perl has been popular as a CGI accelerator.  In
  order to provide a CGI-like environment, mod_perl must manage areas of
  the runtime which have a longer lifetime than when running under
  mod_cgi.  For example, the B<%ENV> environment variable table, B<END>
  blocks, B<@INC> include paths, etc.
  
  CGI emulation will be supported in 2.0, but done so in a way that it
  is encapsulated in its own handler.  Rather that 1.x which uses the
  same response handler, regardless if the module requires CGI emulation
  or not.  With an I<ithreads> enabled Perl, it will also be possible to
  provide more robust namespace protection.
  
  =head1 Apache::* Library
  
  The majority of the standard Apache::* modules in 1.x will be
  supported in 2.0.  Apache::Registry will likely be replaced with
  something akin to the Apache::PerlRun/Apache::RegistryNG replacement
  prototype that exists in 1.x.  The main goal being that the non-core
  CGI emulation components of these modules are broken into small,
  re-usable pieces to subclass Apache::Registry like behavior.
  
  =head1 Perl Enhancements
  
  As Perl 5.8.0 is current in development and Perl 6.0 is a long ways
  off, it is possible and reasonable to add enhancements to Perl which
  will benefit mod_perl.  While these enhancements do not preclude the
  design of mod_perl-2.0, they will make an impact should they be
  implemented/accepted into the Perl development track.
  
  =head2 GvSHARED
  
  As mentioned, the perl_clone() API will create a thread-safe
  interpreter clone, which is a copy of all mutable data and a shared
  syntax tree.  The copying includes subroutines, each of which take up
  around 255 bytes, including the symbol table entry.  Multiply that
  number times, say 1200, is around 300K, times 10 interpreter clones,
  we have 3Mb, times 20 clones, 6Mb, and so on.  Pure perl subroutines
  must be copied, as the structure includes the B<PADLIST> of lexical
  variables used within that subroutine.  However, for XSUBs, there is
  no PADLIST, which means that in the general case, perl_clone() will
  copy the subroutine, but the structure will never be written to at
  runtime.  Other common global variables, such as B<@EXPORT> and
  B<%EXPORT_OK> are built at compile time and never modified during
  runtime.
  
  Clearly it would be a big win if XSUBs and such global variables were
  not copied.  However, we do not want to introduce locking of these
  structures for performance reasons.  Perl already supports the concept
  of a read-only variable, a flag which is checked whenever a Perl variable
  will be written to.  A patch has been submitted to the Perl
  development track to support a feature known as B<GvSHARED>.  This
  mechanism allows XSUBs and global variables to be marked as shared, so
  perl_clone() will not copy these structures, but rather point to them.
  
  =head2 Shared SvPVX
  
  The string slot of a Perl scalar is known as the B<SvPVX>.  As Perl
  typically manages the string a variable points to, it must make a copy
  of it.  However, it is often the case that these strings are never
  written to.  It would be possible to implement copy-on-write strings
  in the Perl core with little performance overhead.
  
  =head2 Compile time method lookups
  
  A known disadvantage to Perl method calls is that they are slower than
  direct function calls.  It is possible to resolve method calls at
  compile time, rather than runtime, making method calls just as fast as
  subroutine calls.  However, there is certain information required for
  method look ups that are only known at runtime.  To work around this,
  compile time hints can be used, for example:
  
   my Apache::Request $r = shift;
  
  Tells the Perl compiler to expect an object in the I<Apache::Request>
  class to be assigned to B<$r>.  A patch has already been submitted to
  use this information so method calls can be resolved at compile time.
  However, the implementation does not take into account sub-classing of
  the typed object.  Since the mod_perl API consists mainly of methods,
  it would be advantageous to re-visit the patch to find an acceptable
  solution.
  
  =head2 Memory management hooks
  
  Perl has its own memory management system, implemented in terms of
  I<malloc> and I<free>.  As an optimization, Perl will hang onto
  allocations made for variables, for example, the string slot of a
  scalar variable.  If a variable is assigned, for example, a 5k chunk
  of HTML, Perl will not release that memory unless the variable is
  explicitly I<undef>ed.  It would be possible to modify Perl in such a
  way that the management of these strings are pluggable, and Perl could
  be made to allocate from an APR memory pool.  Such a feature would
  maintain the optimization Perl attempts (to avoid malloc/free), but
  would greatly reduce the process size as pool resources are able to be
  re-used elsewhere.
  
  =head2 Opcode hooks
  
  Perl already has internal hooks for optimizing opcode trees (syntax
  tree).  It would be quite possible for extensions to add their own
  optimizations if these hooks were plugable, for example, optimizing
  calls to I<print>, so they directly call the Apache I<ap_rwrite>
  function, rather than proxy via a I<tied filehandle>.
  
  Another possible optimization would be "inlined" XSUB calls.  Perl has
  a generic opcode for calling subroutines, one which does not know the
  number of arguments coming into and being passed out of a subroutine.
  As the majority of mod_perl API methods have known in/out argument
  lists, it would be possible to implement a much faster version of the
  Perl I<pp_entersub> routine.
  
  =head2 Solar variables
  
  Perl global variables inside threaded MPMs are only global to the
  current interpreter clone in which they are running.  A useful feature
  for mod_perl applications would be the concept of a I<solar> variable,
  which is global across all interpreters.  Such a feature would of
  course require mutex locking, something we do not want to introduce
  for normal Perl variables.  It might be possible to again piggy-back
  the B<SvREADONLY> flag, which if true, checking for another flag
  B<SvSOLAR> which implements the proper locking for concurrent access
  to cross-interpreter globals.
  
  =head1 Maintainers
  
  Maintainer is the person(s) you should contact with updates,
  corrections and patches.
  
  Doug MacEachern E<lt>dougm (at) covalent.netE<gt>
  
  =head1 Authors
  
  =over 
  
  =item * Doug MacEachern E<lt>dougm (at) covalent.netE<gt>
  
  =back
  
  =cut
cvs commit: modperl-2.0/pod .cvsignore modperl_2.0.pod modperl_design.pod

Reply via email to