Daniel Watkins wrote:
> I've run into a bit of trouble with my spider script. Thus far, it is
> able to retrieve all of the data off the website that is contained
> within standard HTML, downloading jpg, gif and bmp images that are
> related to the files (this restriction only being set by a lack of
> further definitions by myself). However, I have run into a problem with
> one link that simply points to a folder (www.aehof.org.uk/forum) within
> which is contained a phpBB forum.

It seems to me this page is no different from any other - it has a bunch of 
links that you can follow to get the content. I'm not sure why you want to 
handle it specially? Except maybe to ignore some of the links, which you will 
have to write into your program.
 
> I've attempted to use 'dircache' but couldn't find a way for it to
> understand web addresses. However, I may not have hit upon the right
> combination of syntax, so may be mistaken. I also considered 'os' but it
> appears to require definition of a particular operating system, which is
> a direction I'd prefer not to take unless I have to. In addition, the
> error messages I received from using 'dircache' traced back into 'os' so
> it is unlikely it would have been suitable for the purpose.

The os module actually hides the differences between operating systems pretty 
well. It has implementations for many os's but the interface you see is 
os-independent. The choice of the correct implementation happens under the 
hood, it is not something you need to be concerned with.

Kent

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to