-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Tony Lewis wrote:
> Micah Cowan wrote:
> 
>> The manpage doesn't need to give as detailed explanations as the
>> info manual (though, as it's auto-generated from the info manual,
>> this could be hard to avoid); but it should fully describe
>> essential features.
> 
> I can't see any good reason for one set of documentation to be
> different than another. Let the user choose whatever is comfortable.
> Some users may not even know they have a choice between man and info.

It's mentioned in the manpage, though not nearly as strongly as is
typical for GNU projects.

GNU often has "stub" manpages, which say something along the lines of:

  The  full  documentation  for foo is maintained as a Texinfo manual...

and describe how to invoke info.

If, for some reason, we were to decide that we shouldn't have all the
same info in the manpage as exists in the info manual, we should at
least be calling out the fact that much more information, including a
variety of very useful rc commands, are detailed in the info document.

I think we should either be a "stub", or a fairly complete "manual" (and
agree that the latter seems preferable); nothing half-way between: what
we have now is a fairly incomplete manual.

>> While we're on the subject: should we explicitly warn about using
>> such features as robots=off, and --user-agent? And what should
>> those warnings be? Something like, "Use of this feature may help
>> you download files from which wget would otherwise be blocked, but
>> it's kind of sneaky, and web site administrators may get upset and
>> block your IP address if they discover you using it"?
> 
> No, I don't think we should nor do I think use of those features is
> "sneaky".
> 
> With regard to robots.txt, people use it when they don't want
> *automated* spiders crawling through their sites. A well-crafted wget
> command that downloads selected information from a site without
> regard to the robots.txt restrictions is a very different situation.
> It's true that someone could --mirror the site while ignoring
> robots.txt, but even that is legitimate in many cases.
> 
> With regard to user agent, many websites customize their output based
> on the browser that is displaying the page. If one does not set user
> agent to match their browser, the retrieved content may be very
> different than what was displayed in the browser.

Yes, but I meant with specific intent to get around website
restrictions. Certain sites (image galleries, for instance) often
specifically want to force users to access their resources via the web,
and do not wish to allow users to mass-download their resources for
later offline perusal: they want to force the users to come back each
time to use them--especially if the site requires a subscription of some
sort (e.g., porn), or their ad revenue is directly tied to some "Top
100" list and they want to force you to vote (warez, roms). Of course,
if you're downloading warez, the concept of "sneaky" wget options
probably doesn't concern you overly much! :)

Whether getting around such restrictions with --user-agent and -e
robots=no is "sneaky" is debatable, when you're legitimately accessing
content that you could straightforwardly obtain with your web browser
(where "legitimate" in this context probably means you're subscribed to
the image gallery, or own the physical counterparts to the roms you're
downloading, or what have you), but you'll almost certainly be banned
from the site if you happen to be discovered.

Perhaps this is more FAQ territory than manual territory at any rate: I
was thinking of crafting a FAQ entry for dealing with such issues
(no-follow, especially, seems to trip users up), but I want to craft it
carefully, with ample warnings about what you're doing. :)

> All that being said, it wouldn't hurt to have a section in the
> documentation on wget etiquette: think carefully about ignoring
> robots.txt, use --wait to throttle the download if it will be
> lengthy, etc.

I think that may be wise.

> Perhaps we can even add a --be-nice option similar to --mirror that
> adjusts options to match the etiquette suggestions.

Don't we already follow typical etiquette by default? Or do you mean
that to override non-default settings in the rcfile or whatnot?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGnmul7M8hyUobTrERCFjfAJ4i9bj23B3JdT7BfVdpdGigxU5TFgCfSqPX
SxIOUXW1wNogbRdU2BfwWBI=
=n+Uw
-----END PGP SIGNATURE-----

Reply via email to