Thanks, Andy, for this analysis, but unfortunately it really doesn't
come close to the scale I'm dealing with. I have a couple of thousand
apache and thttpd processes constantly hitting nfs shares for file stats
on over a terabyte of content (only a very small fraction of this is web
templates). And we already have projects slated to migrate other sites
that will double the traffic. This is definite and we need to be ready
for tripling the traffic within the next 2-3 years. We are, however,
due for some current benchmarking which will have to be done anyway as
development ensues on our rewrite. Previous benchmarking was performed
a few years ago by the CTO at the time, and is no longer available.
Certainly I can see where this thread is being considered as preemptive
optimization. This is my fault for not giving the full scope of the
issue and just leaving it as "too much nfs activity." But I don't see
this as preemptive optimization. I see it as an unnecessary call beyond
the initial page load and not much less than if I were to attempt to
frequently validate that my chair exists after I've sat down in it.
Once it's there and it's in use it does not require re-validation. If
the chair were to break while I'm sitting in it, the entire process of
sitting down must be restarted--I get up, find a replacement chair and
then sit down again. It's the same thing with templates: if an error is
found in the template then a revision is made which must be approved, it
then replaces the template and the servers are restarted. Think of the
templates less in the light of traditional web pages and more in the
light of perl modules. Perl doesn't care if a module has changed or
even if it has been deleted from disk after it's been loaded. If you
want to enact a changed library, you (typically) must bounce the process.
This may sound a bit over the edge, but it helps to ensure the integrity
of any code that could be used for processing credit cards. Only a few
people can approve these these types of changes while many people may
have their hands in the development of templates. As a further
complication, those who can approve changes cannot be involved in them
beyond reviewing the revision. I've really been trying to avoid getting
into much detail here, it's time consuming and borders on disclosing
company policy. I was hoping that simply stating that this is my need
and asking "what is the accepted approach with TT" would suffice, but it
seems that there's not an "accepted" approach.
For whatever reason and whether it's accepted by the community or not I
have a few goals in mind for our redesign that I'm hoping to come close
to using TT. Here are a couple that are relevant to this topic:
* Mark certain templates as "protected" so they cannot be modified
after being loaded and reinstate the ability to modify non-sensitive
pages (which mostly eliminates this whole stat issue from my perspective
except for protected components, because statting a file would once
again be needed).
* Preload selective (primarily the protected) templates in the parent
apache (1.3) process to ensure that changes can't sneak through as new
apache children are spawned
If these two goals in particular can be done with TT, then this issue is
resolved for me as soon as I find out how. Otherwise, I'm left with
locking down *all* template revisions until I come up with an
alternative. For as much as I know about TT at this point, it might
mean sub-classing from Template::Provider, but as I mentioned, I'm new
to TT and I'd really prefer to keep my hands out of there until I become
more familiar with it.
Locking down template revisions (in part or in whole) is a tiny detail
in the big picture and that it's not being done because I *want* to do
it or that I think it's the best approach (it's certainly not the
easiest), it's being done because I *must* do it to show strict auditing
policy over any piece of code involved in a point of sale. We've not
yet solidified our final templating solution; I'm still working out
discovery and so far TT is the forerunner. This entire issue may
resolve out to having my head stuck in previous solutions that I really
need to rethink. But I was just looking for a response to how this has
been dealt with previously by experienced TT users (as I think was the
original post on this thread). My joining this thread was simply in
that it sounded similar to what I will be dealing with.
--
Tim
Andy Wardley wrote:
Andy Lester wrote:
Actually, you'll only have half a million stat calls, which according to my
test below is less than a second of machine overhead per day.
perl -MBenchmark -e 'timethis(10_000_000, sub { stat $0 })'
timethis 432000: -1 wallclock secs ( 0.18 usr + 0.28 sys = 0.46 CPU)
@ 939130.43/s (n=432000)
Why? $Template::Provider::STAT_TTL is set to 1 (second) by default. That
means that each file is checked once a second, at most, regardless of how
many page impressions you're getting. That's 86k stat() calls per day
(60*60*24), per template used (which I assumed to be 5 in the calculation
above) = 432,000
And even if you were hitting stat() for every template, for every page, 20
million stat() calls is still only approx. 20 seconds of processor overhead
per day. That's pretty cheap.
You mention that you're mounted across NFS, which will certainly make things
a little slower. But if you're looking to speed thing up, then replicating
the templates to a local filesytem is going to have a much greater impact
than trying to optimise away stat() calls.
So I think Andy's advice is sound: measure what you're doing, and be
sure that you're optimising the right thing.
I personally suspect that tuning out the stat() calls isn't going to save
you a great deal of time, but I could be wrong. So if you want to reduce the
number of stat calls, simply set STAT_TTL to a higher value.
$Template::Provider::STAT_TTL = 60;
HTH
A
_______________________________________________
templates mailing list
[email protected]
http://lists.template-toolkit.org/mailman/listinfo/templates