Morten Bo Johansen <[EMAIL PROTECTED]> writes:

> Andrew M. Bishop <[EMAIL PROTECTED]> wrote:
> 
> AMB> Morten Bo Johansen <[EMAIL PROTECTED]> writes:
> 
> >> I can't seem to fetch pages in batch mode from The Internet
> >> Movie Database. In the page that is returned to me, following a
> >> try it seems that the user-agent header has not been passed on
> >> which is a requirement from IMDb. I have indeed set my
> >> user-agent header in wwwoffle.config and to be sure I can see

> If I place an outgoing offline request with IMDb.com and then
> go online to -fetch it, then I get the error page (403) which
> contains among other things these lines:
> 
>     Server: imdb-online-1107.vdc.amazon.com
>     (us.imdb.com)(us.imdb.com)
>     Date: Sun Oct  6 11:47:00 2002
>     IP: 212.54.69.73
>     Browser:
>     Cookie:
>     Url: /Name?wieth,+mogens
>     Method: GET
>     Referrer:
> 
> As you can see the information about my browser has not been
> recorded by their server. The wwwoffled.log file obtained by
> the command given by you doesn't contain any information about
> the User-Agent either.
> 
> Now, if I go online again and refresh the 403-page that I just
> got from the previous -fetch then the user-agent header from my
> wwwoffle.conf is being passed on by wwwoffled and I get the
> information I want.
> 
> If you place an outgoing request using the command
> 
>    $ wwwoffle http://us.imdb.com/Name?finney,+albert
> 
> and then -fetch it, does it work for you?   

No it does not, and rightly so since doing it this way will not add a
User-Agent field which imdb expects.  The wwwoffle program does not
add in a User-Agent of its own.  The CensorHeader section of the
configuration file only censors existing headers, it does not add
headers of its own.

When you refresh the page that you already had from doing this then
there will be a User-Agent header from the browser that you used to do
the refresh.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.7/user.html

Reply via email to