On Fri, Nov 17, 2023 at 7:02 PM Keith Lofstrom wrote:
> > On Fri, Nov 17, 2023 at 08:26:21AM -0800, Rich Shepard wrote:
> > > I need to download ~15G of data from a web site. Using a PLUG mail list
>
> Apropos of not much, when I first got on this crazy
> [...] I will "soon" install 100/100 Mbps
On Fri, Nov 17, 2023 at 12:43:29PM -0800, Keith Lofstrom wrote:
...
> I "wget-ed" a website, and was soon contacted by a
> panicked/angry sysadmin watching their website brought
> to a crawl because their 5 mbps upload bandwidth was
> clobbered for hours by my scrape of their site. My bad.
When y
> On Fri, Nov 17, 2023 at 08:26:21AM -0800, Rich Shepard wrote:
> > I need to download ~15G of data from a web site. Using a PLUG mail list
Apropos of not much, when I first got on this crazy
internet merry-go-round, the nearest host was UCBVAX
in Berkeley, and we connected with modems. I connec
Correction:
$20/yr for Runbox, not $20/mo.
On Fri, Nov 17, 2023, at 17:32, Kevin Williams wrote:
> Hi Galen,
>
> I myself have been on this journey to migrate my internet accounts registered
> using my Gmail address to my own domain, and use multiple aliases in the form
> of serv...@mydomain.tl
Hi Galen,
I myself have been on this journey to migrate my internet accounts registered
using my Gmail address to my own domain, and use multiple aliases in the form
of serv...@mydomain.tld.
Over the last year and a half, I have moved about 90 accounts from Gmail. Some
sites allow self-service
Hi,
A smart, but non-sysadmin, non-linux-using friend asks:
"Hey I’ve been interested in getting off Gmail and switching to a mail
service I pay for. And then using it with IMAP on my various devices. Do
you have any knowledge about other services besides Gmail, yahoo, etc?"
I'm pretty sure t
On Fri, 17 Nov 2023, Russell Senior wrote:
Fwiw, I played a little bit with some approaches, unsuccessfully. But, the
problem might yield under a little more pressure. The problem I eventually
encountered and gave up at was that: a) the structure of their site isn't
consistent; and b) there are
On Fri, 17 Nov 2023, Bill Barry wrote:
Limiting how deep to recurse is helpful. You may want just the page
you start with and one level down from that.
--level= depth
--level=1 would be a good place to start.
Bill,
Good idea.
Thanks,
Rich
Fwiw, I played a little bit with some approaches, unsuccessfully. But, the
problem might yield under a little more pressure. The problem I eventually
encountered and gave up at was that: a) the structure of their site isn't
consistent; and b) there are links with embedded spaces or something. This
On Fri, Nov 17, 2023 at 3:17 PM Rich Shepard wrote:
>
> On Fri, 17 Nov 2023, Michael Barnes wrote:
>
> > I have used this command string successfully in the past to download
> > complete websites.
> >
> > $ wget --recursive --no-clobber --page-requisites
> > --html-extension --
On Fri, 17 Nov 2023, Michael Barnes wrote:
I have used this command string successfully in the past to download
complete websites.
$ wget --recursive --no-clobber --page-requisites
--html-extension --convert-links --restrict-file-names=windows
--domains website.com
On Fri, 17 Nov 2023, Keith Lofstrom wrote:
A related question is "how much will the Portland Harbor Superfund Site
need to pay to upload 15 GB to you? How much upload bandwidth do they
have?
Keith,
I don't think anyone knows.
I "wget-ed" a website, and was soon contacted by a panicked/angry
On Fri, Nov 17, 2023 at 08:26:21AM -0800, Rich Shepard wrote:
> I need to download ~15G of data from a web site. Using a PLUG mail list
> thread from 2008 I tried this syntax:
> wget -r --accept *.* http://ph-public-data.com/
A related question is "how much will the Portland Harbor
Superfund Site
On Fri, 17 Nov 2023, Michael Barnes wrote:
I have used this command string successfully in the past to download
complete websites.
$ wget --recursive --no-clobber --page-requisites
--html-extension --convert-links --restrict-file-names=windows
--domains website.com
On Fri, 17 Nov 2023, Michael Ewan wrote:
You may be getting caught by robots.txt, try setting the user agent header,
i.e. -U agent-string
Michael,
The wget man page doesn't inform me how to identify the agent-string. All it
says is:
--user-agent=agent-string
Identify as agent-stri
I have used this command string successfully in the past to download
complete websites.
$ wget --recursive --no-clobber --page-requisites
--html-extension --convert-links --restrict-file-names=windows
--domains website.com --no-parent website.com
HTH,
Michael
You may be getting caught by robots.txt, try setting the user agent header,
i.e. -U agent-string
On Fri, Nov 17, 2023 at 8:31 AM Rich Shepard
wrote:
> On Fri, 17 Nov 2023, Rich Shepard wrote:
>
> > I need to download ~15G of data from a web site. Using a PLUG mail list
> > thread from 2008 I tri
On Fri, 17 Nov 2023, Rich Shepard wrote:
I need to download ~15G of data from a web site. Using a PLUG mail list
thread from 2008 I tried this syntax:
wget -r --accept *.* http://ph-public-data.com/
To clarify, I don't think that I want to use the wget -m (mirror) command
because I don't think
I need to download ~15G of data from a web site. Using a PLUG mail list
thread from 2008 I tried this syntax:
wget -r --accept *.* http://ph-public-data.com/
What was quickly returned is the contents of a ph-public-data.com/
directory:
about/ contact/ document/ file/ whatsnew/
What I want ar
19 matches
Mail list logo