Re: wget-1.9 compile error

2003-10-23 Thread Hrvoje Niksic
It seems that Apache's fnmatch.h is shadowing the one from libc.
Please remove the former and your build problems should go away.


Re: Using wget to make a static coy of a dynamic shop.

2003-10-23 Thread Hrvoje Niksic
[EMAIL PROTECTED] writes:

 Will wget build me such a copy of the entire site?  Full interlinked
 and spiderable?

Yes, with several buts.

1. Your site should be written and interlinked in fairly discernable
   HTML.  No image rollovers linked only through JavaScript.  No CSS
   imports.

2. Banners are usually a problem, although probably not in your case.
   Since they are off-site, Wget converts them to full links
   (http://...), but google shouldn't mind.

3. Wget cannot make the URLs on your site short and nice.  It will
   follow the redirects provided by mod_rewrite, but replacing the
   links in the HTML pages will be up to you.

The command to make the copy would be something like
`wget --mirror --convert-links --html-extension URL'.  If your site
includes images from another host, you'll probably need to add
`--span-hosts -D DOMAIN-TO-SPAN'.  See the info documentation for more
details.

 I am thinking to use a tool for making the dynamic url´s to short
 static urls e.g.
 mydomain/shop.cgi?action=addtempl=cart1  - mydomain/add/cart1
 Such a Dynamic2Static Rewriting can be triggered by cron.
 The indexed static url´s will be rewritten by mod_rewrite.

 Whats a goog Linux tool for that stringreplacement?
 A table for stringreplacement is required with regular expressions:
 action=addtempl=cart1  - mydomain/add/cart1
 action=addtempl=cart2  - mydomain/add/cart2

Different people use different tools.  For simple in-place regexp
substitutions, the one-liner `perl -pi -e 's/FOO/BAR/g' FILES...' is
probably a good choice.


Problem with wget 1.9 and question mark at least on windows

2003-10-23 Thread Boris New
Hi,

I tried wget 1.9 for windows from Heiko Herold 
(http://xoomer.virgilio.it/hherold/) and the problem with the filters 
and the question marks remains:
On the following page:
http://www.wordtheque.com/owa-wt/new_wordtheque.wcom_literature.literaturea_page?lang=FRletter=Asource=searchpage=1
If I want to download all the webpages containing FR or fr (after 
?), it's impossible.
But it's possible to download all webpages containing page (before ?).
I tried all the new --restrict-file-names options and that does'nt 
change anything.

Is it due to windows version? Is there a way to correct this behavior?

Thanks in advance,

Boris
http://www.lexique.org
http://www.borisnew.org
_
Envie de discuter en live avec vos amis ? Télécharger MSN Messenger
http://www.ifrance.com/_reloc/m la 1ère messagerie instantanée de France


RE: Problem with wget 1.9 and question mark at least on windows

2003-10-23 Thread Herold Heiko
Also note, I didn't yet compile and publish the msvc windows binary for 1.9
- I suppose that was one of the beta binaries.
Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 23, 2003 12:12 PM
 To: Boris New
 Cc: [EMAIL PROTECTED]
 Subject: Re: Problem with wget 1.9 and question mark at least 
 on windows
 
 
 Sorry about that, Wget currently applies -R and -A only to file names,
 not to the query part of the URL.  Therefore there is currently no
 built-in way to do what you want.
 
 I do plan to fix this, but Wget 1.9 was too late in the works to add
 such a feature.
 
 The current behavior is due to many people using -R to restrict based
 on file names and file name extensions; this usage might break if -R
 also matched the query portion of the URL by default.
 


Re: how to unsibscribe?

2003-10-23 Thread Hrvoje Niksic
To unsubscribe, send a message to [EMAIL PROTECTED].


RE: Wget 1.9 has been released

2003-10-23 Thread Herold Heiko
Windows MSVC binary present at
http://xoomer.virgilio.it/hherold

Attention if you want to compile your own: there still is the
configure.bat.in file - usually in released packages that was renamed
already to configure.bat .

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax

 -Original Message-
 From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 22, 2003 11:50 PM
 To: [EMAIL PROTECTED]
 Subject: Wget 1.9 has been released
 
 
 I've announced the 1.9 release on freshmeat and will send a mail to
 [EMAIL PROTECTED] shortly.  You can get it from ftp.gnu.org or from a mirror
 site.
 
 ftp://ftp.gnu.org/pub/gnu/wget/wget-1.9.tar.gz
 
 The MD5 checksum of the archive should be:
 
 18ac093db70801b210152dd69b4ef08a  wget-1.9.tar.gz
 
 Again, thanks to everyone who made this release possible by
 contributing bug reports, help, suggestions, test cases, code,
 documentation, or support -- in no particular order.
 
 A summary of the user-visible changes since 1.8, borrowed from `NEWS',
 follows:
 
 * Changes in Wget 1.9.
 
 ** It is now possible to specify that POST method be used for HTTP
 requests.  For example, `wget --post-data=id=foodata=bar URL' will
 send a POST request with the specified contents.
 
 ** IPv6 support is available, although it's still experimental.
 
 ** The `--timeout' option now also affects DNS lookup and establishing
 the TCP connection.  Previously it only affected reading and writing
 data.  Those three timeouts can be set separately using
 `--dns-timeout', `--connection-timeout', and `--read-timeout',
 respectively.
 
 ** Download speed shown by the progress bar is based on the data
 recently read, rather than the average speed of the entire download.
 The ETA projection is still based on the overall average.
 
 ** It is now possible to connect to FTP servers through FWTK
 firewalls.  Set ftp_proxy to an FTP URL, and Wget will automatically
 log on to the proxy as [EMAIL PROTECTED].
 
 ** The new option `--retry-connrefused' makes Wget retry downloads
 even in the face of refused connections, which are otherwise
 considered a fatal error.
 
 ** The new option `--dns-cache=off' may be used to prevent Wget from
 caching DNS lookups.
 
 ** Wget no longer escapes characters in local file names based on
 whether they're appropriate in URLs.  Escaping can still occur for
 nonprintable characters or for '/', but no longer for frequent
 characters such as space.  You can use the new option
 --restrict-file-names to relax or strengthen these rules, which can be
 useful if you dislike the default or if you're downloading to
 non-native partitions.
 
 ** Handling of HTML comments has been dumbed down to conform to what
 users expect and other browsers do: instead of being treated as SGML
 declaration, a comment is terminated at the first occurrence of --.
 Use `--strict-comments' to revert to the old behavior.
 
 ** Wget now correctly handles relative URIs that begin with //, such
 as //img.foo.com/foo.jpg.
 
 ** Boolean options in `.wgetrc' and on the command line now accept
 values yes and no along with the traditional on and off.
 
 ** It is now possible to specify decimal values for timeouts, waiting
 periods, and download rate.  For instance, `--wait=0.5' now works as
 expected, as does `--dns-timeout=0.5' and even `--limit-rate=2.5k'.
 


Re: Naughty make install.info, naugthy, bad boy...

2003-10-23 Thread DervishD
Hi Hrvoje :)

 * Hrvoje Niksic [EMAIL PROTECTED] dixit:
  I've downloaded and installed wget 1.9 without problems, but when I
  install something seamlessly, I insist on messing around until I
  break something...
 :-)

The problem is that I do that with my *own* software, too XDD
 
  The matter is that if you delete 'wget.info' to force recreation,
  and your makeinfo is more or less recent, you *don't* have
  wget.info-[0-9] files, since new texinfo's have the default
  --split-size limit raised from 50k to 300k.
 That must be a Makeinfo 4.5 thing.  I'm still using 4.3, which has the
 split limit unchanged.

In fact I think that it is a 4.6 thing. But it should not matter
at all, the only difference is how many info files are generated.
 
 I think I originally used the more complex forms because I wanted to
 avoid matching something like wget.info.bak.  I'm not sure if there
 was a specific reason for this or if I was just being extra-careful.

You're right, the simpler glob (wget.info*) will match any
garbage after the '.info' part :((( Definitely it's not a good idea.
 
 for file in wget.info wget.info-*[0-9]
 do
   test -f $file  install -c -m 644 $file ...
 done

This should do, since '$file' won't be never empty : It must
be done in *both* parts of the surrounding 'if-fi' clause...
 
 (Of course, it would use $$file and such in actual Makefile, but you
 get the picture.)

Yes, yes... It's a long story but I've dealt a lot with
makefiles... In fact, the solution I was talking about (using the
'wildcard' function of GNU make, avoid globbing in for loops, etc...)
is caused by an special generated makefile to avoid an empty glob
pattern in the 'for' loop. Here is not needed at all: I was blind and
I even didn't thought about the simpler solution you provide O:))
 
 That way we retain the strictness of only matching wget.info and
 wget.info-numbers, but avoid problems when only wget.info is
 actually generated.

Right :)) If you want I can prepare the patch for you, containing
too a typo in the documentation. BTW, in the documentation there is
no information about that new --retry-connrefused (at least I haven't
found it) and obviously no mention about any rcfile equivalent, am I
missing something or I should wait for 1.9.1?

Thanks a lot for wget, as always (I use it a lot), and if you
want me to prepare the patch, just tell.

Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736
http://www.pleyades.net  http://raul.pleyades.net/


Re: Using wget to make a static coy of a dynamic shop.

2003-10-23 Thread Hrvoje Niksic
[EMAIL PROTECTED] writes:

 Will wget build me such a copy of the entire site?  Full interlinked
 and spiderable?

The command to make the copy would be something like
`wget --mirror --convert-links --html-extension URL'.

I started wget with
wget --mirror --convert-links --html-extension http://mydomain.com/  /home/www/web10/9

Its running since several hours and using in top now 65% of memory
and shows 300 MByte Memory.

How may I let wget make a file by file copy of the site?
How may I stop it from running before out of memory?
Thanks, Maggi



Re: Naughty make install.info, naugthy, bad boy...

2003-10-23 Thread Hrvoje Niksic
DervishD [EMAIL PROTECTED] writes:

 Right :)) If you want I can prepare the patch for you, containing
 too a typo in the documentation.

I think I'll modify the Makefile.  A patch that fixes (or points out)
the typo in the documentation would be appreciated, though.

 BTW, in the documentation there is no information about that new
 --retry-connrefused (at least I haven't found it) and obviously no
 mention about any rcfile equivalent, am I missing something or I
 should wait for 1.9.1?

You're not missing anything -- it's an oversight on my part.