Hi,

I've implemented an extension to WWWOFFLE that allows WWWOFFLE to execute local
files and use its output instead of its contents. The executed program receives
information from WWWOFFLE in the form of environment variables, CGI-style.
Personally I use this feature for two main purposes.

The first is to bypass so called "interstitial" or intermediary ads.
For instance, http://www.msnbc.com/news/TECH_Front.asp

redirects you to

http://www.msnbc.com/ads/trans/taDisplay.asp?realURL=/news/TECH_Front.asp&qs=&title=MSNBC+Technology&adf=nbctes&clik=redir

This is a page with a large ad, which also contains a link to the real
content you wanted:

http://www.msnbc.com/news/TECH_Front.asp?ta=y

With my extension to WWWOFFLE, I put the following in the DonGet section of
the configuration file:

http://*.com/ads/*
<http://*msnbc.com/ads/trans/taDisplay.asp?*realURL*>  replacement = 
/local/bin/msnbc_redirector

I wrote a perl script which I stored in
/var/spool/wwwoffle/html/local/bin/msnbc_redirector with execution permissions
(I've listed the script at the end of this message). The script extracts the
real URL from URL of the intermediary ad and prints a HTML message (including
headers) which redirects the browser to the real URL.
Now when the msnbc.com site tries to present me with an interstitial ad, I never
see the ad but get the desired content directly!
Different sites using intermediary ads use slightly different URL formats, but
the script can easily be adapted to accommodate these too.
(Note: AMB says that he plans make add a feature to WWWOFFLE that will allow for
(regular expression type) substitutions to be made in URLs in the the Alias
section of the configuration file, so there will probably be an alternative
method for this type of ad circumvention in a future version of WWWOFFLE.)

The second main purpose I use my extension for, is doing automatic logins.
I regularly use the internet archive of the New Scientist and this site requires
several mouse clicks and some typing to log in (some browsers can remember the
password for you, but it is still not a one-click process).
I wrote a perl script that in combination with my extension to WWWOFFLE logs me
in automatically and allows me to access the archive with a single mouse click.
Furthermore the script caches the non-persistent cookies necessary to use the
archive. This has some additional benefits: in the past, if I quit my browser
(either intentionally or because of a crash) without logging out, these cookies
would get lost and I'd be locked out of my account for half an hour. I can now
also share a single login session between different browsers, even if they are
on different machines on my LAN.
I'm not claiming that it is a good idea in general to put automated
authentication in the proxy instead of the client (browser). This script simply
solves a particular problem I have with a particular site that I use regularly.

The implementation I wrote of this CGI-style script calling is still incomplete,
but if you want to try it out for your self, I'm willing to send you a patch
file.
I'm also very interested in hearing from people who have ideas for improvements
and especially new applications for this feature.


Paul A. Rombouts <[EMAIL PROTECTED]>
Vincent van Goghlaan 27
5246 GA  Rosmalen
Netherlands


#!/usr/bin/perl -w

use strict;
use lib ('/var/spool/wwwoffle/html/modules/perl');
require HTMLmessages;

FINDURL: {
    my $query_string = $ENV{QUERY_STRING};
    if(defined($query_string) && $query_string =~ m{(?i:realurl)=([^&;]+)}) {
        my $realurl=$1;
        if ($realurl !~ m{^\w+://}) {
            if(defined(my $host = $ENV{HTTP_HOST})) {
                if ($realurl !~ m{^/}) {$realurl= "/$realurl"}
                $realurl="http://$host$realurl";;
            }
            else {last FINDURL}
        }
        if ($realurl =~ m{\?}) {$realurl.='&ta=y'}
        else {$realurl.='?ta=y'}

        HTMLmessage_redirect($realurl);
        exit 0;
    }
}

if(my $url = $ARGV[0]) {
    HTMLmessage_wrongurl($url);
}
else {
    HTMLmessage_internalerror();
    exit 1;
}    

exit 0;


Note: The HTML_message_redirect subroutine is defined elsewhere and simply
prints a response that redirects the browser to the URL with the real content.

Reply via email to