Re: [Tutor] can I walk or glob a website?

2011-05-19 Thread Albert-Jan Roskam
? ~~ From: Marc Tompkins marc.tompk...@gmail.com To: tutor@python.org Sent: Wed, May 18, 2011 9:10:06 PM Subject: Re: [Tutor] can I walk or glob a website? On Wed, May 18, 2011 at 11:21 AM, Marc Tompkins marc.tompk...@gmail.com wrote: On Wed, May 18, 2011

Re: [Tutor] can I walk or glob a website?

2011-05-19 Thread Marc Tompkins
On Thu, May 19, 2011 at 12:25 AM, Albert-Jan Roskam fo...@yahoo.com wrote: Thank you, always useful to study other people's code. I wasn't planning to create a Gui for my app. It was necessary for the purpose - I didn't want all, or even most, of the mp3s on the site, but certainly enough of

[Tutor] can I walk or glob a website?

2011-05-18 Thread Albert-Jan Roskam
Hello, How can I walk (as in os.walk) or glob a website? I want to download all the pdfs from a website (using urllib.urlretrieve), extract certain figures (using pypdf- is this flexible enough?) and make some statistics/graphs from those figures (using rpy and R). I forgot what the process of

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Alan Gauld
Albert-Jan Roskam fo...@yahoo.com wrote How can I walk (as in os.walk) or glob a website? I don't think there is a way to do that via the web. Of course if you have access to the web servers filesystem you can use os.walk to do it as for any other filesystem, but I don't think its

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Flynn, Stephen (L P - IT)
Gauld Sent: Wednesday, May 18, 2011 10:18 AM To: tutor@python.org Subject: Re: [Tutor] can I walk or glob a website? Albert-Jan Roskam fo...@yahoo.com wrote How can I walk (as in os.walk) or glob a website? I don't think there is a way to do that via the web. Of course if you have access

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Dave Angel
On 01/-10/-28163 02:59 PM, Alan Gauld wrote: Albert-Jan Roskam fo...@yahoo.com wrote How can I walk (as in os.walk) or glob a website? I don't think there is a way to do that via the web. Of course if you have access to the web servers filesystem you can use os.walk to do it as for any other

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Steven D'Aprano
On Wed, 18 May 2011 07:06:07 pm Albert-Jan Roskam wrote: Hello, How can I walk (as in os.walk) or glob a website? If you're on Linux, use wget or curl. If you're on Mac, you can probably install them using MacPorts. If you're on Windows, you have my sympathies. *wink* I want to download

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Albert-Jan Roskam
From: Dave Angel da...@ieee.org To: Alan Gauld alan.ga...@btinternet.com Cc: tutor@python.org Sent: Wed, May 18, 2011 11:51:35 AM Subject: Re: [Tutor] can I walk or glob a website? On 01/-10/-28163 02:59 PM, Alan Gauld wrote: Albert-Jan Roskam fo...@yahoo.com wrote How can I walk

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Peter Otten
Albert-Jan Roskam wrote: How can I walk (as in os.walk) or glob a website? I want to download all the pdfs from a website (using urllib.urlretrieve), extract certain figures (using pypdf- is this flexible enough?) and make some statistics/graphs from those figures (using rpy and R). I forgot

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Alan Gauld
Dave Angel da...@ieee.org wrote Albert-Jan Roskam fo...@yahoo.com wrote How can I walk (as in os.walk) or glob a website? I don't think there is a way to do that via the web. It has to be (more or less) possible. That's what google does for their search engine. Google trawls the site

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Prasad, Ramit
Are you going to be doing this seldom enough that the bandwidth used won't be a DOS attack? It will not solve the problem completely, but I know that wget (and probably curl) have speed limiters you can set to help reduce the chances of DOS. If you are using urllib you could look at:

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Albert-Jan Roskam
Hi Steven, From: Steven D'Aprano st...@pearwood.info To: tutor@python.org Sent: Wed, May 18, 2011 1:13:17 PM Subject: Re: [Tutor] can I walk or glob a website? On Wed, 18 May 2011 07:06:07 pm Albert-Jan Roskam wrote: Hello, How can I walk (as in os.walk) or glob a website? If you're

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Albert-Jan Roskam
From: Alan Gauld alan.ga...@btinternet.com To: tutor@python.org Sent: Wed, May 18, 2011 4:40:19 PM Subject: Re: [Tutor] can I walk or glob a website? Dave Angel da...@ieee.org wrote Albert-Jan Roskam fo...@yahoo.com wrote How can I walk (as in os.walk

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Marc Tompkins
On Wed, May 18, 2011 at 2:06 AM, Albert-Jan Roskam fo...@yahoo.com wrote: Hello, How can I walk (as in os.walk) or glob a website? I want to download all the pdfs from a website (using urllib.urlretrieve), extract certain figures (using pypdf- is this flexible enough?) and make some

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Hugo Arts
On Wed, May 18, 2011 at 7:32 PM, Albert-Jan Roskam fo...@yahoo.com wrote: === Thanks for your reply. I tried wget, which seems to be a very handy tool. However, it doesn't work on this particular site. I tried wget -e robots=off -r -nc --no-parent -l6 -A.pdf

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Prasad, Ramit
It's horribly crude, in retrospect, and I'm embarrassed re-reading my code - but if you're interested I can forward it (if only as an example of what _not_to do.) I would be interested even if the OP is not ;) Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Marc Tompkins
On Wed, May 18, 2011 at 11:04 AM, Prasad, Ramit ramit.pra...@jpmchase.comwrote: It's horribly crude, in retrospect, and I'm embarrassed re-reading my code - but if you're interested I can forward it (if only as an example of what _not_to do.) I would be interested even if the OP is not ;)

Re: [Tutor] can I walk or glob a website?

2011-05-18 Thread Marc Tompkins
On Wed, May 18, 2011 at 11:21 AM, Marc Tompkins marc.tompk...@gmail.comwrote: On Wed, May 18, 2011 at 11:04 AM, Prasad, Ramit ramit.pra...@jpmchase.com wrote: It's horribly crude, in retrospect, and I'm embarrassed re-reading my code - but if you're interested I can forward it (if only as