Re: Ignoring robots.txt [was Re: wget default behavior...]

2007-10-17 Thread Tony Godshall
> Tony Godshall wrote: > >> ... Perhaps it should be one of those things that one can do > >> oneself if one must but is generally frowned upon (like making a > >> version of wget that ignores robots.txt). > > > > Damn. I was only joking about ignor

Re: Ignoring robots.txt [was Re: wget default behavior...]

2007-10-17 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Godshall wrote: >> ... Perhaps it should be one of those things that one can do >> oneself if one must but is generally frowned upon (like making a >> version of wget that ignores robots.txt). > > Damn. I was on

Ignoring robots.txt [was Re: wget default behavior...]

2007-10-17 Thread Tony Godshall
> ... Perhaps it should be one of those things that one can do > oneself if one must but is generally frowned upon (like making a > version of wget that ignores robots.txt). Damn. I was only joking about ignoring robots.txt, but now I'm thinking[1] there may be good reasons to do s

Re: Man pages [Re: ignoring robots.txt]

2007-07-19 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Christopher G. Lewis wrote: > Micah et al. - > > Just for an FYI - the whole texi->info, texi->html and > (texi->rtf->hlp) is *very* fragile in the windows world. You actually > have to download a *very* old version of makeinfo (1.68, not even o

RE: Man pages [Re: ignoring robots.txt]

2007-07-19 Thread Christopher G. Lewis
x64 or Vista (can't recall off the top of my head). So if it has to go away, so be it. Christopher G. Lewis http://www.ChristopherLewis.com > -Original Message- > From: Micah Cowan [mailto:[EMAIL PROTECTED] > Sent: Thursday, July 19, 2007 1:16 PM > To: WGET@sunsite.

Man pages [Re: ignoring robots.txt]

2007-07-19 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Daniel Stenberg wrote: > On Wed, 18 Jul 2007, Micah Cowan wrote: > >> The manpage doesn't need to give as detailed explanations as the info >> manual (though, as it's auto-generated from the info manual, this >> could be hard to avoid); but it shoul

Re: ignoring robots.txt

2007-07-19 Thread Andreas Pettersson
Daniel Stenberg wrote: On Wed, 18 Jul 2007, Micah Cowan wrote: The manpage doesn't need to give as detailed explanations as the info manual (though, as it's auto-generated from the info manual, this could be hard to avoid); but it should fully describe essential features. I know GNU project

Re: ignoring robots.txt

2007-07-19 Thread Daniel Stenberg
On Wed, 18 Jul 2007, Micah Cowan wrote: The manpage doesn't need to give as detailed explanations as the info manual (though, as it's auto-generated from the info manual, this could be hard to avoid); but it should fully describe essential features. I know GNU projects for some reason go with

Re: Man pages [Re: ignoring robots.txt]

2007-07-18 Thread Hrvoje Niksic
Micah Cowan <[EMAIL PROTECTED]> writes: >> Converting from Info to man is harder than it may seem. The script >> that does it now is basically a hack that doesn't really work well >> even for the small part of the manual that it tries to cover. > > I'd noticed. :) > > I haven't looked at the scri

Man pages [Re: ignoring robots.txt]

2007-07-18 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hrvoje Niksic wrote: > Micah Cowan <[EMAIL PROTECTED]> writes: > >> I think we should either be a "stub", or a fairly complete "manual" >> (and agree that the latter seems preferable); nothing half-way >> between: what we have now is a fairly incomp

Re: ignoring robots.txt

2007-07-18 Thread Hrvoje Niksic
Micah Cowan <[EMAIL PROTECTED]> writes: > I think we should either be a "stub", or a fairly complete "manual" > (and agree that the latter seems preferable); nothing half-way > between: what we have now is a fairly incomplete manual. Converting from Info to man is harder than it may seem. The sc

Re: ignoring robots.txt

2007-07-18 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Tony Lewis wrote: > Micah Cowan wrote: > >> Don't we already follow typical etiquette by default? Or do you >> mean that to override non-default settings in the rcfile or >> whatnot? > > We don't automatically use a --wait time between requests. I'

RE: ignoring robots.txt

2007-07-18 Thread Tony Lewis
Micah Cowan wrote: > Don't we already follow typical etiquette by default? Or do you mean > that to override non-default settings in the rcfile or whatnot? We don't automatically use a --wait time between requests. I'm not sure what other "nice" options we'd want to make easily available, but th

Re: ignoring robots.txt

2007-07-18 Thread Micah Cowan
any rate: I was thinking of crafting a FAQ entry for dealing with such issues (no-follow, especially, seems to trip users up), but I want to craft it carefully, with ample warnings about what you're doing. :) > All that being said, it wouldn't hurt to have a section in the > docu

RE: ignoring robots.txt

2007-07-18 Thread Tony Lewis
gard to the robots.txt restrictions is a very different situation. It's true that someone could --mirror the site while ignoring robots.txt, but even that is legitimate in many cases. With regard to user agent, many websites customize their output based on the browser that is displaying

Re: ignoring robots.txt

2007-07-18 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Steven M. Schweda wrote: > From: Josh Williams > >> As far as I can tell, there's nothing in the man page about it. > >It's pretty well hidden. > > -e robots=off > > At this point, I normally just grind my teeth instead of complaining >

Re: ignoring robots.txt

2007-07-18 Thread Steven M. Schweda
From: Josh Williams > As far as I can tell, there's nothing in the man page about it. It's pretty well hidden. -e robots=off At this point, I normally just grind my teeth instead of complaining about the differences between the command-line options and the commands in the ".wgetrc" sta

Re: ignoring robots.txt

2007-07-18 Thread Josh Williams
On 7/18/07, Maciej W. Rozycki <[EMAIL PROTECTED]> wrote: There is no particular reason, so we do. As far as I can tell, there's nothing in the man page about it.

Re: ignoring robots.txt

2007-07-18 Thread Maciej W. Rozycki
On Wed, 18 Jul 2007, Josh Williams wrote: > Is there any particular reason we don't have an option to ignore robots.txt? There is no particular reason, so we do. Maciej

ignoring robots.txt

2007-07-18 Thread Josh Williams
Is there any particular reason we don't have an option to ignore robots.txt?