?
~~
From: Marc Tompkins marc.tompk...@gmail.com
To: tutor@python.org
Sent: Wed, May 18, 2011 9:10:06 PM
Subject: Re: [Tutor] can I walk or glob a website?
On Wed, May 18, 2011 at 11:21 AM, Marc Tompkins marc.tompk...@gmail.com wrote:
On Wed, May 18, 2011
On Thu, May 19, 2011 at 12:25 AM, Albert-Jan Roskam fo...@yahoo.com wrote:
Thank you, always useful to study other people's code. I wasn't planning to
create a Gui for my app.
It was necessary for the purpose - I didn't want all, or even most, of the
mp3s on the site, but certainly enough of
Hello,
How can I walk (as in os.walk) or glob a website? I want to download all the
pdfs from a website (using urllib.urlretrieve), extract certain figures (using
pypdf- is this flexible enough?) and make some statistics/graphs from those
figures (using rpy and R). I forgot what the process of
Albert-Jan Roskam fo...@yahoo.com wrote
How can I walk (as in os.walk) or glob a website?
I don't think there is a way to do that via the web.
Of course if you have access to the web servers filesystem
you can use os.walk to do it as for any other filesystem,
but I don't think its
Gauld
Sent: Wednesday, May 18, 2011 10:18 AM
To: tutor@python.org
Subject: Re: [Tutor] can I walk or glob a website?
Albert-Jan Roskam fo...@yahoo.com wrote
How can I walk (as in os.walk) or glob a website?
I don't think there is a way to do that via the web.
Of course if you have access
On 01/-10/-28163 02:59 PM, Alan Gauld wrote:
Albert-Jan Roskam fo...@yahoo.com wrote
How can I walk (as in os.walk) or glob a website?
I don't think there is a way to do that via the web.
Of course if you have access to the web servers filesystem you can use
os.walk to do it as for any other
On Wed, 18 May 2011 07:06:07 pm Albert-Jan Roskam wrote:
Hello,
How can I walk (as in os.walk) or glob a website?
If you're on Linux, use wget or curl.
If you're on Mac, you can probably install them using MacPorts.
If you're on Windows, you have my sympathies.
*wink*
I want to download
From: Dave Angel da...@ieee.org
To: Alan Gauld alan.ga...@btinternet.com
Cc: tutor@python.org
Sent: Wed, May 18, 2011 11:51:35 AM
Subject: Re: [Tutor] can I walk or glob a website?
On 01/-10/-28163 02:59 PM, Alan Gauld wrote:
Albert-Jan Roskam fo...@yahoo.com wrote
How can I walk
Albert-Jan Roskam wrote:
How can I walk (as in os.walk) or glob a website? I want to download all
the pdfs from a website (using urllib.urlretrieve), extract certain
figures (using pypdf- is this flexible enough?) and make some
statistics/graphs from those figures (using rpy and R). I forgot
Dave Angel da...@ieee.org wrote
Albert-Jan Roskam fo...@yahoo.com wrote
How can I walk (as in os.walk) or glob a website?
I don't think there is a way to do that via the web.
It has to be (more or less) possible. That's what google does for
their search engine.
Google trawls the site
Are you going to be doing this seldom enough that the bandwidth used won't be
a
DOS attack?
It will not solve the problem completely, but I know that wget (and probably
curl) have speed limiters you can set to help reduce the chances of DOS. If you
are using urllib you could look at:
Hi Steven,
From: Steven D'Aprano st...@pearwood.info
To: tutor@python.org
Sent: Wed, May 18, 2011 1:13:17 PM
Subject: Re: [Tutor] can I walk or glob a website?
On Wed, 18 May 2011 07:06:07 pm Albert-Jan Roskam wrote:
Hello,
How can I walk (as in os.walk) or glob a website?
If you're
From: Alan Gauld alan.ga...@btinternet.com
To: tutor@python.org
Sent: Wed, May 18, 2011 4:40:19 PM
Subject: Re: [Tutor] can I walk or glob a website?
Dave Angel da...@ieee.org wrote
Albert-Jan Roskam fo...@yahoo.com wrote
How can I walk (as in os.walk
On Wed, May 18, 2011 at 2:06 AM, Albert-Jan Roskam fo...@yahoo.com wrote:
Hello,
How can I walk (as in os.walk) or glob a website? I want to download all
the pdfs from a website (using urllib.urlretrieve), extract certain figures
(using pypdf- is this flexible enough?) and make some
On Wed, May 18, 2011 at 7:32 PM, Albert-Jan Roskam fo...@yahoo.com wrote:
=== Thanks for your reply. I tried wget, which seems to be a very handy
tool. However, it doesn't work on this particular site. I tried wget -e
robots=off -r -nc --no-parent -l6 -A.pdf
It's horribly crude, in retrospect, and I'm embarrassed re-reading my code -
but if you're interested I can forward it (if only as an example of what
_not_to do.)
I would be interested even if the OP is not ;)
Ramit
Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
On Wed, May 18, 2011 at 11:04 AM, Prasad, Ramit
ramit.pra...@jpmchase.comwrote:
It's horribly crude, in retrospect, and I'm embarrassed re-reading my
code - but if you're interested I can forward it (if only as an example of
what _not_to do.)
I would be interested even if the OP is not ;)
On Wed, May 18, 2011 at 11:21 AM, Marc Tompkins marc.tompk...@gmail.comwrote:
On Wed, May 18, 2011 at 11:04 AM, Prasad, Ramit ramit.pra...@jpmchase.com
wrote:
It's horribly crude, in retrospect, and I'm embarrassed re-reading my
code - but if you're interested I can forward it (if only as
18 matches
Mail list logo