Re: Feedback feature request
Thanks for the pointer. It looks like a good product, but unfortunately not what I'm after. I really need it to have no dependencies under win32, like wget, so I can just drop the exe in and make it go. Scott Alan E [EMAIL PROTECTED] 22/04/2002 17:06 To: [EMAIL PROTECTED] cc: [EMAIL PROTECTED], (bcc: Mail Administrator/Newcastle/Computer Systems Australia), (bcc: ) Subject:Re: Feedback feature request On Mon, Apr 22, 2002 at 04:07:10PM +1000, [EMAIL PROTECTED] wrote: definition download points onto LAN servers. What I would like to see in the software is a switch to allow destructive mirroring of an ftp site, where files that no longer exist on the server are deleted from the download target directory. I would appreciate if this type of feature could be included in a release somewhere in the future. You've just described the original mirror.pl (a PITA to set up), or the python based emirror, which is easy, and can produce all sorts of nice html logs and such if you want. See the emirror project on sourceforge. -- AlanE --- The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
RE: Feedback feature request
Maybe you didn't know there are (at least two) ways to compile perl on w32 to a independant executable: perl2exe and a tool made by Activestate. Both are commercial. Heiko Herold -- -- PREVINET S.p.A.[EMAIL PROTECTED] -- Via Ferretto, 1ph x39-041-5907073 -- I-31021 Mogliano V.to (TV) fax x39-041-5907472 -- ITALY -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Monday, April 22, 2002 9:14 AM To: [EMAIL PROTECTED] Subject: Re: Feedback feature request Thanks for the pointer. It looks like a good product, but unfortunately not what I'm after. I really need it to have no dependencies under win32, like wget, so I can just drop the exe in and make it go. Scott Alan E [EMAIL PROTECTED] 22/04/2002 17:06 To: [EMAIL PROTECTED] cc: [EMAIL PROTECTED], (bcc: Mail Administrator/Newcastle/Computer Systems Australia), (bcc: ) Subject:Re: Feedback feature request On Mon, Apr 22, 2002 at 04:07:10PM +1000, [EMAIL PROTECTED] wrote: definition download points onto LAN servers. What I would like to see in the software is a switch to allow destructive mirroring of an ftp site, where files that no longer exist on the server are deleted from the download target directory. I would appreciate if this type of feature could be included in a release somewhere in the future. You've just described the original mirror.pl (a PITA to set up), or the python based emirror, which is easy, and can produce all sorts of nice html logs and such if you want. See the emirror project on sourceforge. -- AlanE --- The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
spaces in file names
When a URL path component contains a space, is wget supposed to create a file with a space in it, or not? Regardless of the answer to that question, I can't imagine how the following behavior could be correct: --04:40:07-- http://www.vieuxmac.com/DOWNLOAD/SYSTEM%20SOFTWARE/System%20Software%206.0.5/S6.0.5%20-%20F6.1.5%20-%20F.sit.bin = `System%20Software%206.0.5/S6.0.5 - F6.1.5 - F.sit.bin' Note that the URL contained both directories and files with spaces in them, and that it escaped all spaces as %20. Note that wget created *directories* with literal %20 in them, but created *files* with literal spaces. Both can't be right. wget 1.8. Launched like so: wget -m -np -nH --cut-dirs=2 http://www.vieuxmac.com/DOWNLOAD/SYSTEM%20SOFTWARE/ -- Jamie Zawinski [EMAIL PROTECTED] http://www.jwz.org/ [EMAIL PROTECTED] http://www.dnalounge.com/
Re: apache irritations
On Mon, 22 Apr 2002, Jamie Zawinski wrote: I know this would be somewhat evil, but can we have a special case in wget to assume that files named ?N=D and index.html?N=D are the same as index.html? I'm tired of those dumb apache sorting directives showing up in my mirrors as if they were real files... How about using the -R option of wget? A brief test proves -R '*\?[A-Z]=[A-Z]' works as it should. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+
Re: Validating cookie domains
Ian Abbott [EMAIL PROTECTED] writes: I realized it was stupid after I posted it (I was about to leave!) when I remembered cc domains like .de don't need an extra period. I thought maybe a table of exceptions would sort that out The problem is that new domains appear all the times, and policies change. Any static table is doomed to fail miserably. However, that doesn't work for your .fr example. Nothing works for the .fr example. :-) Ye gods! If it was just a reflection of the common 3-letter TLDs such as .com, that would be a reasonable thing to check for. That's what I eventually implemented.
Re: apache irritations
Maciej W. Rozycki [EMAIL PROTECTED] writes: On Mon, 22 Apr 2002, Jamie Zawinski wrote: I know this would be somewhat evil, but can we have a special case in wget to assume that files named ?N=D and index.html?N=D are the same as index.html? I'm tired of those dumb apache sorting directives showing up in my mirrors as if they were real files... How about using the -R option of wget? A brief test proves -R '*\?[A-Z]=[A-Z]' works as it should. Or maybe the default system wgetrc should ship with something like: reject = *?[A-Z]=[A-Z] Adding new reject patterns will correctly append to this. If the user wanted to nullify that in his `.wgetrc', he'd need to set `reject' to empty string.
Re: apache irritations
On Mon, 22 Apr 2002, Hrvoje Niksic wrote: How about using the -R option of wget? A brief test proves -R '*\?[A-Z]=[A-Z]' works as it should. Or maybe the default system wgetrc should ship with something like: reject = *?[A-Z]=[A-Z] Note the difference between strings! -- the backslash before the quotation mark is essential as otherwise it's a glob character. Adding new reject patterns will correctly append to this. If the user wanted to nullify that in his `.wgetrc', he'd need to set `reject' to empty string. Well, I don't think it's sane but adding a *commented-out* reject line with an appropriate annotation to the default system wgetrc looks like a good idea to me. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+
ScanMail Message: To Recipient virus found or matched file blocking setting.
ScanMail for Microsoft Exchange has taken action on the message, please refer to the contents of this message for further details. Sender = [EMAIL PROTECTED] Recipient(s) = [EMAIL PROTECTED]; Subject = CELLSPACING Scanning Time = 04/22/2002 18:06:07 Engine/Pattern = 6.150-1001/267 Action on message: The attachment border.bat matched file blocking settings. ScanMail has taken the Deleted action. In einer für Sie bestimmten Nachricht wurde ein als gefährlich eingestufter Anhang geblockt oder es wurde ein Virus gefunden. Der Absender der Nachricht wird ebenfalls automatisch informiert. Als gefährlich eingestuft gelten u.A. alle ausführbaren Dateien wie z.B. *.exe, *.bat, *.com, *.cmd, *.pif, *.scr. Wenn sie eine Datei mit entsprechender Endung verschicken oder empfangen wollen, komprimieren sie diese bitte zu einer *.zip-Datei mit Winzip. An attachment has been blocked which is classified as dangerous or a Virus has been found in the mail received by you. The sender of this mail was automatically informed. Among the attachments classified as dangerous are all executable files like *.exe, *.bat, *.com, *.cmd, *.pif, *.scr. If you need to send or receive such an attachment you should compress it first into a *.zip archive by using Winzip.
ScanMail Message: To Recipient virus found or matched file blocking setting.
ScanMail for Microsoft Exchange has taken action on the message, please refer to the contents of this message for further details. Sender = [EMAIL PROTECTED] Recipient(s) = [EMAIL PROTECTED]; Subject = How are you Scanning Time = 04/22/2002 18:07:56 Engine/Pattern = 6.150-1001/267 Action on message: The attachment WHAT1.exe matched file blocking settings. ScanMail has taken the Deleted action. In einer für Sie bestimmten Nachricht wurde ein als gefährlich eingestufter Anhang geblockt oder es wurde ein Virus gefunden. Der Absender der Nachricht wird ebenfalls automatisch informiert. Als gefährlich eingestuft gelten u.A. alle ausführbaren Dateien wie z.B. *.exe, *.bat, *.com, *.cmd, *.pif, *.scr. Wenn sie eine Datei mit entsprechender Endung verschicken oder empfangen wollen, komprimieren sie diese bitte zu einer *.zip-Datei mit Winzip. An attachment has been blocked which is classified as dangerous or a Virus has been found in the mail received by you. The sender of this mail was automatically informed. Among the attachments classified as dangerous are all executable files like *.exe, *.bat, *.com, *.cmd, *.pif, *.scr. If you need to send or receive such an attachment you should compress it first into a *.zip archive by using Winzip.
Re: apache irritations
On 22/04/2002 16:38:15 Maciej W. Rozycki wrote: On Mon, 22 Apr 2002, Hrvoje Niksic wrote: How about using the -R option of wget? A brief test proves -R '*\?[A-Z]=[A-Z]' works as it should. Or maybe the default system wgetrc should ship with something like: reject = *?[A-Z]=[A-Z] Note the difference between strings! -- the backslash before the quotation mark is essential as otherwise it's a glob character. [A-Z] is a bit extreme, IMHO. How about reject = *\?[NMSD]=[AD] ^^ literal '?' needed here Well, I don't think it's sane but adding a *commented-out* reject line with an appropriate annotation to the default system wgetrc looks like a good idea to me. A good idea. -- Csaba Ráduly, Software Engineer Sophos Anti-Virus email: [EMAIL PROTECTED]http://www.sophos.com US Support: +1 888 SOPHOS 9 UK Support: +44 1235 559933
Re: apache irritations
On Mon, 22 Apr 2002 [EMAIL PROTECTED] wrote: reject = *?[A-Z]=[A-Z] Note the difference between strings! -- the backslash before the quotation mark is essential as otherwise it's a glob character. [A-Z] is a bit extreme, IMHO. How about reject = *\?[NMSD]=[AD] Hmm, it's too fragile in my opinion. What if a new version of Apache defines a new format? ^^ literal '?' needed here Exactly -- I've meant the question mark above, of course. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+
Re: apache irritations
Maciej W. Rozycki wrote: Hmm, it's too fragile in my opinion. What if a new version of Apache defines a new format? I think all of the expressions proposed thus far are too fragile. Consider the following URL: http://www.google.com/search?num=100q=%2Bwget+-GNU The regular expression needs to account for multiple arguments separated by ampersands. It also needs to account from any valid URI character between an equal sign and either end of string or an ampersand. I'm not fluent enough in regular expressions to compose one myself. (Some day I'll absorb all of Friedl's Mastering Regular Expressions, but not today.) Tony
Re: apache irritations
On Mon, 22 Apr 2002, Tony Lewis wrote: I think all of the expressions proposed thus far are too fragile. Consider the following URL: http://www.google.com/search?num=100q=%2Bwget+-GNU The regular expression needs to account for multiple arguments separated by ampersands. It also needs to account from any valid URI character between an equal sign and either end of string or an ampersand. I'm not sure what you are referring to. We are discussing a common problem with static pages generated by default by Apache as index.html objects for server's filesystem directories providing no default page. Any dynamic content should probably be protected by robots.txt and otherwise dealt by a user specifically depending on the content. BTW, wget's accept/reject rules are not regular expressions but simple shell globbing patterns. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+
segmentation fault on bad url
Hi, wget -t 3 -d -r -l 3 -H --random-wait -nd --delete-after -A.jpg,.gif,.zip,.png,.pdf http://http://www.microsoft.com DEBUG output created by Wget 1.8.1 on linux-gnu. zsh: segmentation fault wget -t 3 -d -r -l 3 -H --random-wait -nd --delete-after And that's all. -- SALIOU Renaud NoRSfall (icq: 61340098) [EMAIL PROTECTED] 06.99.75.50.30
Re: apache irritations
Tony Lewis [EMAIL PROTECTED] writes: Maciej W. Rozycki wrote: Hmm, it's too fragile in my opinion. What if a new version of Apache defines a new format? I think all of the expressions proposed thus far are too fragile. Consider the following URL: http://www.google.com/search?num=100q=%2Bwget+-GNU That URL will not match the proposed pattern. As Maciej said, Wget's reject feature implements shell-style patterns that are much simpler than regexps. Also, they always match the entire string by default.
Re: apache irritations
Maciej W. Rozycki wrote: I'm not sure what you are referring to. We are discussing a common problem with static pages generated by default by Apache as index.html objects for server's filesystem directories providing no default page. Really? The original posting from Jamie Zawinski said: I know this would be somewhat evil, but can we have a special case in wget to assume that files named ?N=D and index.html?N=D are the same as index.html? I'm tired of those dumb apache sorting directives showing up in my mirrors as if they were real files... I understood the question to be about URLs containing query strings (which Jamie called sorting directives) showing up as separate files. I thought the discussion was related to that topic. Maybe it diverged from that later in the chain and I missed the change of topic. I think what Jamie wants is one copy of index.html no matter how many links of the form index.html?N=D appear. BTW, wget's accept/reject rules are not regular expressions but simple shell globbing patterns. OK. Tony
Wget in Windows Filename Saving Problem
To whom it may concern: Wget works great except for if you follow a site that has ? in the URL for querystring, Windows cannot save a filename with a ? in the filename. What can we do to correct this problem? Jeff CreamerFenwick Technologies, Inc.Systems Administrator/ProgrammerPhone: 304-623-5260 Ext. 16Email: [EMAIL PROTECTED]IM: JCreamer23MSN: JCreamer23
Re: RFE:add tar option
Max Waterman [EMAIL PROTECTED] writes: Someone (rudely) suggested it was unacceptable to ask for a 'cc' rather than joining the email list. That is not the case -- it is perfectly acceptable to post a question and ask for `Cc'. Especially so when you're posting to [EMAIL PROTECTED], an address specifically maintained for users' questions.