Re: wget-1.9 compile error
It seems that Apache's fnmatch.h is shadowing the one from libc. Please remove the former and your build problems should go away.
Re: Using wget to make a static coy of a dynamic shop.
[EMAIL PROTECTED] writes: Will wget build me such a copy of the entire site? Full interlinked and spiderable? Yes, with several buts. 1. Your site should be written and interlinked in fairly discernable HTML. No image rollovers linked only through JavaScript. No CSS imports. 2. Banners are usually a problem, although probably not in your case. Since they are off-site, Wget converts them to full links (http://...), but google shouldn't mind. 3. Wget cannot make the URLs on your site short and nice. It will follow the redirects provided by mod_rewrite, but replacing the links in the HTML pages will be up to you. The command to make the copy would be something like `wget --mirror --convert-links --html-extension URL'. If your site includes images from another host, you'll probably need to add `--span-hosts -D DOMAIN-TO-SPAN'. See the info documentation for more details. I am thinking to use a tool for making the dynamic url´s to short static urls e.g. mydomain/shop.cgi?action=addtempl=cart1 - mydomain/add/cart1 Such a Dynamic2Static Rewriting can be triggered by cron. The indexed static url´s will be rewritten by mod_rewrite. Whats a goog Linux tool for that stringreplacement? A table for stringreplacement is required with regular expressions: action=addtempl=cart1 - mydomain/add/cart1 action=addtempl=cart2 - mydomain/add/cart2 Different people use different tools. For simple in-place regexp substitutions, the one-liner `perl -pi -e 's/FOO/BAR/g' FILES...' is probably a good choice.
Problem with wget 1.9 and question mark at least on windows
Hi, I tried wget 1.9 for windows from Heiko Herold (http://xoomer.virgilio.it/hherold/) and the problem with the filters and the question marks remains: On the following page: http://www.wordtheque.com/owa-wt/new_wordtheque.wcom_literature.literaturea_page?lang=FRletter=Asource=searchpage=1 If I want to download all the webpages containing FR or fr (after ?), it's impossible. But it's possible to download all webpages containing page (before ?). I tried all the new --restrict-file-names options and that does'nt change anything. Is it due to windows version? Is there a way to correct this behavior? Thanks in advance, Boris http://www.lexique.org http://www.borisnew.org _ Envie de discuter en live avec vos amis ? Télécharger MSN Messenger http://www.ifrance.com/_reloc/m la 1ère messagerie instantanée de France
RE: Problem with wget 1.9 and question mark at least on windows
Also note, I didn't yet compile and publish the msvc windows binary for 1.9 - I suppose that was one of the beta binaries. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Thursday, October 23, 2003 12:12 PM To: Boris New Cc: [EMAIL PROTECTED] Subject: Re: Problem with wget 1.9 and question mark at least on windows Sorry about that, Wget currently applies -R and -A only to file names, not to the query part of the URL. Therefore there is currently no built-in way to do what you want. I do plan to fix this, but Wget 1.9 was too late in the works to add such a feature. The current behavior is due to many people using -R to restrict based on file names and file name extensions; this usage might break if -R also matched the query portion of the URL by default.
Re: how to unsibscribe?
To unsubscribe, send a message to [EMAIL PROTECTED].
RE: Wget 1.9 has been released
Windows MSVC binary present at http://xoomer.virgilio.it/hherold Attention if you want to compile your own: there still is the configure.bat.in file - usually in released packages that was renamed already to configure.bat . Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2003 11:50 PM To: [EMAIL PROTECTED] Subject: Wget 1.9 has been released I've announced the 1.9 release on freshmeat and will send a mail to [EMAIL PROTECTED] shortly. You can get it from ftp.gnu.org or from a mirror site. ftp://ftp.gnu.org/pub/gnu/wget/wget-1.9.tar.gz The MD5 checksum of the archive should be: 18ac093db70801b210152dd69b4ef08a wget-1.9.tar.gz Again, thanks to everyone who made this release possible by contributing bug reports, help, suggestions, test cases, code, documentation, or support -- in no particular order. A summary of the user-visible changes since 1.8, borrowed from `NEWS', follows: * Changes in Wget 1.9. ** It is now possible to specify that POST method be used for HTTP requests. For example, `wget --post-data=id=foodata=bar URL' will send a POST request with the specified contents. ** IPv6 support is available, although it's still experimental. ** The `--timeout' option now also affects DNS lookup and establishing the TCP connection. Previously it only affected reading and writing data. Those three timeouts can be set separately using `--dns-timeout', `--connection-timeout', and `--read-timeout', respectively. ** Download speed shown by the progress bar is based on the data recently read, rather than the average speed of the entire download. The ETA projection is still based on the overall average. ** It is now possible to connect to FTP servers through FWTK firewalls. Set ftp_proxy to an FTP URL, and Wget will automatically log on to the proxy as [EMAIL PROTECTED]. ** The new option `--retry-connrefused' makes Wget retry downloads even in the face of refused connections, which are otherwise considered a fatal error. ** The new option `--dns-cache=off' may be used to prevent Wget from caching DNS lookups. ** Wget no longer escapes characters in local file names based on whether they're appropriate in URLs. Escaping can still occur for nonprintable characters or for '/', but no longer for frequent characters such as space. You can use the new option --restrict-file-names to relax or strengthen these rules, which can be useful if you dislike the default or if you're downloading to non-native partitions. ** Handling of HTML comments has been dumbed down to conform to what users expect and other browsers do: instead of being treated as SGML declaration, a comment is terminated at the first occurrence of --. Use `--strict-comments' to revert to the old behavior. ** Wget now correctly handles relative URIs that begin with //, such as //img.foo.com/foo.jpg. ** Boolean options in `.wgetrc' and on the command line now accept values yes and no along with the traditional on and off. ** It is now possible to specify decimal values for timeouts, waiting periods, and download rate. For instance, `--wait=0.5' now works as expected, as does `--dns-timeout=0.5' and even `--limit-rate=2.5k'.
Re: Naughty make install.info, naugthy, bad boy...
Hi Hrvoje :) * Hrvoje Niksic [EMAIL PROTECTED] dixit: I've downloaded and installed wget 1.9 without problems, but when I install something seamlessly, I insist on messing around until I break something... :-) The problem is that I do that with my *own* software, too XDD The matter is that if you delete 'wget.info' to force recreation, and your makeinfo is more or less recent, you *don't* have wget.info-[0-9] files, since new texinfo's have the default --split-size limit raised from 50k to 300k. That must be a Makeinfo 4.5 thing. I'm still using 4.3, which has the split limit unchanged. In fact I think that it is a 4.6 thing. But it should not matter at all, the only difference is how many info files are generated. I think I originally used the more complex forms because I wanted to avoid matching something like wget.info.bak. I'm not sure if there was a specific reason for this or if I was just being extra-careful. You're right, the simpler glob (wget.info*) will match any garbage after the '.info' part :((( Definitely it's not a good idea. for file in wget.info wget.info-*[0-9] do test -f $file install -c -m 644 $file ... done This should do, since '$file' won't be never empty : It must be done in *both* parts of the surrounding 'if-fi' clause... (Of course, it would use $$file and such in actual Makefile, but you get the picture.) Yes, yes... It's a long story but I've dealt a lot with makefiles... In fact, the solution I was talking about (using the 'wildcard' function of GNU make, avoid globbing in for loops, etc...) is caused by an special generated makefile to avoid an empty glob pattern in the 'for' loop. Here is not needed at all: I was blind and I even didn't thought about the simpler solution you provide O:)) That way we retain the strictness of only matching wget.info and wget.info-numbers, but avoid problems when only wget.info is actually generated. Right :)) If you want I can prepare the patch for you, containing too a typo in the documentation. BTW, in the documentation there is no information about that new --retry-connrefused (at least I haven't found it) and obviously no mention about any rcfile equivalent, am I missing something or I should wait for 1.9.1? Thanks a lot for wget, as always (I use it a lot), and if you want me to prepare the patch, just tell. Raúl Núñez de Arenas Coronado -- Linux Registered User 88736 http://www.pleyades.net http://raul.pleyades.net/
Re: Using wget to make a static coy of a dynamic shop.
[EMAIL PROTECTED] writes: Will wget build me such a copy of the entire site? Full interlinked and spiderable? The command to make the copy would be something like `wget --mirror --convert-links --html-extension URL'. I started wget with wget --mirror --convert-links --html-extension http://mydomain.com/ /home/www/web10/9 Its running since several hours and using in top now 65% of memory and shows 300 MByte Memory. How may I let wget make a file by file copy of the site? How may I stop it from running before out of memory? Thanks, Maggi
Re: Naughty make install.info, naugthy, bad boy...
DervishD [EMAIL PROTECTED] writes: Right :)) If you want I can prepare the patch for you, containing too a typo in the documentation. I think I'll modify the Makefile. A patch that fixes (or points out) the typo in the documentation would be appreciated, though. BTW, in the documentation there is no information about that new --retry-connrefused (at least I haven't found it) and obviously no mention about any rcfile equivalent, am I missing something or I should wait for 1.9.1? You're not missing anything -- it's an oversight on my part.