Nevermind. With another hour's work, I solved it. For reference, here's
my set of rewrite rules:

   RewriteEngine on
   RewriteLog /var/www/popline/logs/rewrite.log
   #Turn off rewritelog with level 0. 2 is useful/normal.
   RewriteLogLevel 0
   RewriteRule ^/docs$  /docs/index.html
   RewriteRule ^/docs/$ /docs/index.html
   RewriteRule ^/docs/index.*                   -       [L]     #If this
matches, don't do any rewriting
   RewriteRule ^/404.shtml                      -       [L]     #If this
matches, don't do any rewriting, so error pages come up correctly
   RewriteRule ^/docs/sitemap.*                 -       [L]     #If this
matches, don't do any rewriting. For Google sitemap program
   #If the file doesn't exist, rewrite and ...
   RewriteCond %{REQUEST_FILENAME}              !-f     
   RewriteRule ^/docs/[0-9]{4}/([0-9]{6})\.html /docs/$1        [R,L]
#   submit
   RewriteRule ^/docs/[0-9]{4}/[0-9]{6}\.html   -       [L]     #If this
matches, don't do any rewriting
   #Note that in RewriteRule below, must use %3F for '?' after
'icswppro.dll'. '?' has special meaning in Rewrite substitutions.
   RewriteRule ^/docs/([0-9]{6})$
http://db.jhuccp.org/ics-wpd/exec/icswppro.dll?BU=http://db.jhuccp.org/i
cs-wpd/exec/icswppro.dll&QF0=DocNo&QI0=$1&TN=Popline&AC=QBE_QUERY&MR=30\
%DL=1&&RL=1&&RF=LongRecordDisplay&DF=LongRecordDisplay
[P]
   RewriteRule ^/docs/[0-9]{4}.*                -       [L]     #If this
matches, don't do any rewriting
   #NOTE: If you want to do a Google sitemap verify, the next line must
be commented out,
   #   so that Apache doesn't return a 200 code (forwarding it on to
db.jhuccp.org) for a non-existant page.
   RewriteRule ^/.*$ http://db.jhuccp.org/ics-wpd/popweb/basic.html
[R,L]

Thanks, again, for being here if needed.

-Kevin

-----Original Message-----
From: Zembower, Kevin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 05, 2006 3:20 PM
To: users@httpd.apache.org
Subject: [EMAIL PROTECTED] Help with rewrite for errors?

I have a number of documents in HTML files like this:

www.popline.org/docs/0784/045796.html
www.popline.org/docs/0429/209471.html
www.popline.org/docs/0003/690206.html

In most of these records, the link is broken (as it is in these three
examples). This is a result of old files still in Google.

However, in these three cases, the original document can be found by
removing the 4 digit directory and the '.html' thusly:

www.popline.org/docs/045796
www.popline.org/docs/209471
www.popline.org/docs/690206

Because of the nature of our system, these resolve correctly.

Can anyone help me with a set of RewriteRules that will, whenever a 404
error is generated, transform the URL as indicated and resubmit it?

Here are the current Rewrite rules in my system:
   RewriteEngine on
   RewriteLog /var/www/popline/logs/rewrite.log
   #Turn off rewritelog with level 0. 2 is useful/normal.
   RewriteLogLevel 0
   RewriteRule ^/docs$  /docs/index.html
   RewriteRule ^/docs/$ /docs/index.html
   RewriteRule ^/docs/index.*                   -       [L]     #If this
matches, don't do any rewriting
   RewriteRule ^/error/.*                       -       [L]     #If this
matches, don't do any rewriting, so error pages come up correctly
   RewriteRule ^/404.shtml                      -       [L]     #If this
matches, don't do any rewriting, so error pages come up correctly
   RewriteRule ^/docs/sitemap.*                 -       [L]     #If this
matches, don't do any rewriting. For Google sitemap program
   RewriteRule ^/docs/[0-9]{4}/[0-9]{6}\.html   -       [L]     #If this
matches, don't do any rewriting
   #Note that in RewriteRule below, must use %3F for '?' after
'icswppro.dll'. '?' has special meaning in Rewrite substitutions.
   RewriteRule ^/docs/([0-9]{6})$
http://db.jhuccp.org/ics-wpd/exec/icswppro.dll?BU=http://db.jhuccp.org/i
cs-wpd/exec/icswppro.dll&QF0=DocNo&QI0=$1&TN=Popline&AC=QBE_QUERY&MR=30\
%DL=1&&RL=1&&RF=LongRecordDisplay&DF=LongRecordDisplay
[P]
   RewriteRule ^/docs/[0-9]{4}.*                -       [L]     #If this
matches, don't do any rewriting
   RewriteRule ^/.*$ http://db.jhuccp.org/ics-wpd/popweb/basic.html
[R,L]

Here's an example from the current rewrite log of a 404 generation:

10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e1570/initial] (2) init rewrite
engine with requested uri /docs/0784/045796.html
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e1570/initial] (1) pass through
/docs/0784/045796.html
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e2d30/initial/redir#1] (2) init
rewrite engine with requested uri /404.shtml
10.253.200.90 - - [05/Oct/2006:15:08:01 --0400]
[www.popline.org/sid#8275268][rid#82e2d30/initial/redir#1] (1) pass
through /404.shtml

Here's an earlier excerpt from the rewrite log, before I filtered out
the 'HTTP_NOT_FOUND' information:

10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e3760/initial] (2) init rewrite
engine with requested uri /docs/0211/772369.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e3760/initial] (1) pass through
/docs/0211/772369.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (2) init
rewrite engine with requested uri /error/HTTP_NOT_FOUND.html.var
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (2) rewrite
/error/HTTP_NOT_FOUND.html.var ->
http://db.jhuccp.org/ics-wpd/popweb/basic.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (2)
explicitly forcing redirect with
http://db.jhuccp.org/ics-wpd/popweb/basic.html
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (1) escaping
http://db.jhuccp.org/ics-wpd/popweb/basic.html for redirect
10.253.200.90 - - [04/Oct/2006:11:55:43 --0400]
[www.popline.org/sid#8270170][rid#82e5070/initial/redir#1] (1) redirect
to http://db.jhuccp.org/ics-wpd/popweb/basic.html [REDIRECT/302]

My question is not so much how to transform the submitted URL into the
one without the directory and '.html'. Instead, I don't understand how
to detect the 404 condition and then invoke the rewrite rule.

Thanks in advance for all your help and suggestions.

-Kevin

Kevin Zembower
Internet Services Group manager
Center for Communication Programs
Bloomberg School of Public Health
Johns Hopkins University
111 Market Place, Suite 310
Baltimore, Maryland  21202
410-659-6139 

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server
Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: [EMAIL PROTECTED]
   "   from the digest: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: [EMAIL PROTECTED]
   "   from the digest: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to