can wget be used over a local file system?
I installed a new hard drive. I want to save the web sites I downloaded with wget. The links in the files were automacally changed relative from the web sites to point to within the local file structure. If I just copy the directories over to the new drive I'll lose all the functionality of the HTML links. I tried using wget to retrieve from the one drive and copy to new drive and convert the links like this: wget -r --convert-links /mnt/hda4/home/kb/www.uuhome.de or wget -r --convert-links file://mnt/hda4/home/kb/www.uuhome.de etc. All I can get is : Unsupported scheme. Is there a way to use wget to do this? TIA since I'm not on this list please CC all responses to include me: Kelley Terry[EMAIL PROTECTED]
Re: can wget be used over a local file system?
Kelley Terry [EMAIL PROTECTED] writes: I installed a new hard drive. I want to save the web sites I downloaded with wget. The links in the files were automacally changed relative from the web sites to point to within the local file structure. If I just copy the directories over to the new drive I'll lose all the functionality of the HTML links. I tried using wget to retrieve from the one drive and copy to new drive and convert the links like this: [...] Wget doesn't support copying from the file system. But there should be no need to do that just to convert links -- it is perfectly OK to just move the files, as long as you used `--convert-links' when you downloaded the files from the web.
Re: wget v1.9 (Windows port) newbie needs help in download files recursively...
Herold Heiko [EMAIL PROTECTED] writes: From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] To get the stable sources that have this bug fixed, you might want to check out the head of the wget-1_9 branch in CVS. Heiko, how about creating a bugfix 1.9 release for Windows? No problem with that, but wouldn't a dot release be better ? A dot release will happen anyway. I'm waiting until more bug reports arrive, so that the 1.9.1 code base can get as mature as possible. A dedicated 1.9 branch maintainer would help matters, but noone has volunteered, so I must maintain both branches myself. That doesn't help. I'm not too comfortable with the idea of a windows binary based on the cvs sources for a already released version, confusion could easily arise due to different behaviour There is no difference in behavior. The 1.9 branch contains *only* bug fixes and possibly build changes. Yet I understand your reluctance.
gettext and charsets
This should go to the gettext people, but I couldn't find any mailing list. I've built Wget with NLS support on Win-XP, but the display char-set is wrong. Built with LOCALEDIR=g:/MingW32/share BTW. This is IMHO so ugly. Shouldn't there be a way to set this at runtime (as Lynx does). E.g. have a $WGET_LOCALEDIR and call bindtextdomain() on that. $LANGUAGE doesn't seem to handle drive letters and ':' on the Win32 version of gettext. But the main problem I can solve by e.g. wget -h | iconv -f ISO-8859-1 -t CP850 Isn't there a better way? --gv
Re: gettext and charsets
Gisle Vanem [EMAIL PROTECTED] writes: This should go to the gettext people, but I couldn't find any mailing list. Perhaps you could try posting to [EMAIL PROTECTED]? Failing that, you might want to try at the address of the Free Translation Project and/or the Norwegian national team near you. I'm not sure about the charset issues on Windows. Does gettext detect the presence of GNU iconv? (I assume you have the latter if you have the `iconv' command.) As for the LOCALEDIR, I am not against being able to change it at run time.
Re: gettext and charsets
Hrvoje Niksic [EMAIL PROTECTED] said: I'm not sure about the charset issues on Windows. Does gettext detect the presence of GNU iconv? (I assume you have the latter if you have the `iconv' command.) libintl depends on libiconv: cygcheck wget.exe .. f:\windows\System32\libintl-2.dll f:\windows\System32\libiconv-2.dll Browsing the sources, I found the answer: set OUTPUT_CHARSET=CP850 --gv
About termination of wget and -T option
Hello, I'm tryng to use -T option, as I have to download a file (result of a cgi) which is big, and very often I cannot download it within 15 minutes. I read that the default is 15 inutes (900 secs), and used -T 1800 to have a timeout of 30 minutes. However it seems not to work, and the timeout expires anyway after 15 minutes. Can you gime me any suggestion ? Thanks Luigi Sona
Re: About termination of wget and -T option
Luigi Stefano Sona (lsona) [EMAIL PROTECTED] writes: I'm tryng to use -T option, as I have to download a file (result of a cgi) which is big, and very often I cannot download it within 15 minutes. The -T option times out only if *no data* is read in the designated period, not if the whole file fails to download in that time. I read that the default is 15 inutes (900 secs), and used -T 1800 to have a timeout of 30 minutes. However it seems not to work, and the timeout expires anyway after 15 minutes. Could you post the debug output?
RE: About termination of wget and -T option
How do I get debug output ? Is there any other way to have a total timeout longer than 15 minutes ? Thanks Luigi -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 04, 2003 4:52 PM To: Luigi Stefano Sona (lsona) Cc: [EMAIL PROTECTED] Subject: Re: About termination of wget and -T option Luigi Stefano Sona (lsona) [EMAIL PROTECTED] writes: I'm tryng to use -T option, as I have to download a file (result of a cgi) which is big, and very often I cannot download it within 15 minutes. The -T option times out only if *no data* is read in the designated period, not if the whole file fails to download in that time. I read that the default is 15 inutes (900 secs), and used -T 1800 to have a timeout of 30 minutes. However it seems not to work, and the timeout expires anyway after 15 minutes. Could you post the debug output?
Re: About termination of wget and -T option
Luigi Stefano Sona (lsona) [EMAIL PROTECTED] writes: How do I get debug output ? By using the `-d' option. Is there any other way to have a total timeout longer than 15 minutes ? The `-T' option can be used to specify a longer timeout value. However, in many cases, the timeout is not forced by Wget, but by the operating system routines that implement networking. In that case Wget can only retry the retrieval -- which is what it's designed to do. The debug output ought to provide more insight into what Wget might be doing.
RE: About termination of wget and -T option
I'll try with -d and provide output. About -T, you confirm that anyway, the timeout is for the start of the answer, not for the finish ? Thanks a lot. Luigi -Original Message- From: Hrvoje Niksic [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 04, 2003 6:32 PM To: Luigi Stefano Sona (lsona) Cc: [EMAIL PROTECTED] Subject: Re: About termination of wget and -T option Luigi Stefano Sona (lsona) [EMAIL PROTECTED] writes: How do I get debug output ? By using the `-d' option. Is there any other way to have a total timeout longer than 15 minutes ? The `-T' option can be used to specify a longer timeout value. However, in many cases, the timeout is not forced by Wget, but by the operating system routines that implement networking. In that case Wget can only retry the retrieval -- which is what it's designed to do. The debug output ought to provide more insight into what Wget might be doing.
Re: About termination of wget and -T option
Luigi Stefano Sona (lsona) [EMAIL PROTECTED] writes: About -T, you confirm that anyway, the timeout is for the start of the answer, not for the finish ? Almost. In fact, the timeout applies whenever the download stalls, at any point when Wget waits for data to arrive. You can think of it this way: Wget reads data in a loop like this one: while data_pending: read_chunk_from_network write_chunk_to_disk If a read_chunk_from_network step takes more than 15min, the download is interrupted (and retried). But the whole download can take as long as it takes.
Re: Cookie options
* Hrvoje Niksic [EMAIL PROTECTED] [031101 01:25]: Nicolas, I started merging your patch for saving session cookies and need some advice. The patch adds two options: [...] Any thoughts on this? Actually, the more usefull option is the one which allow to keep session cookies and allow the webserver to treat several wget instances as the same client. The second option is usefull when a server give a cookies to authenticate one the website for one week for example. This option enable wget to always read the same cookie file to access a site. However, this could generate weird behaviour if the data associated with the cookie has been deleted on the server. Your idea about all-in-one option is quite interresting. Nicolas. -- Je peux faire confiance à votre ordinateur. http://www.gnu.org/philosophy/can-you-trust.fr.html
Re: Cookie options
Nicolas Schodet [EMAIL PROTECTED] writes: * Hrvoje Niksic [EMAIL PROTECTED] [031101 01:25]: Nicolas, I started merging your patch for saving session cookies and need some advice. The patch adds two options: [...] Any thoughts on this? Actually, the more usefull option is the one which allow to keep session cookies and allow the webserver to treat several wget instances as the same client. Agreed. For starters, let's add the `--keep-session-cookies' option. If we get reports asking for special treatment of expired cookies, we can resurrect the other one or incorporate it in a more general keep-cookies or something. Interestingly enough, curl has a `--junk-session-cookies', which indicates that it keeps them by default (?). Daniel, are you still listening? :-)
Re: Cookie options
* Hrvoje Niksic [EMAIL PROTECTED] [031104 23:06]: Nicolas, I started merging your patch for saving session cookies and need some advice. The patch adds two options: Any thoughts on this? Actually, the more usefull option is the one which allow to keep session cookies and allow the webserver to treat several wget instances as the same client. Agreed. For starters, let's add the `--keep-session-cookies' option. If we get reports asking for special treatment of expired cookies, we can resurrect the other one or incorporate it in a more general keep-cookies or something. It's ok for me, I only use the --keep-session-cookies option. Nicolas. -- Aidez à défendre le droit d'écrire des logiciels libres ou non : http://www.gnu.org/philosophy/protecting.fr.html
The patch list
I'm curious... is anyone using the patch list to track development? I'm posting all my changes to that list, and sometimes it feels a lot like talking to myself. :-) Factoid of the day: did you know that, as of this writing, `.wgetrc' supports exactly 100 different options?
Re: Time Stamping and Daylight Savings Time
I am and have been using NTFS since the installation of the OS, on a brand new machine. At 05:40 PM 11/4/2003, Gisle Vanem wrote: Fred Holmes [EMAIL PROTECTED] said: OTOH, if anyone knows how to make Windows stop changing the time stamps, that would be even better. You're using FAT filesystem? Convert to NTFS; it stores filetimes in UTC (as 64-bit, 100 nanosecond steps from 1 jan 1601). --gv
FTP time-stamping and time zones
That other thread Time Stamping and Daylight Savings Time reminded me of an issue, that I have been carrying around with me for quite a while and of which I thought would be worth clarifying and maybe also sorting out. For a customer of mine I am keeping FTP mirrors of data files, the customer being located in Europe, the data files being fetched in the U.S., let's say New-England. As there is no proper and standard way to obtain the exact time stamp of a remote file through ftp, the respective directory listing is being parsed instead. But as FTP directory listings do not include the time zone, the files live in, (I think) wget just assumes the local time zone to be identical to the remote one. Am I right with this? As long as this concept is being kept to entirely, this is probably the best, that can be done. But it leads to a situation, that wget creates files with time stamps much older, than the files actually are. E.g. let's assume we now have 07:00 in Europe and 01:00 in New-England and the file, we are going to retrieve from New-England over to Europe, has a New-England local time stamp of 00:00. So when wget retrieves the file from New-England to Europe it is still being time-stamped by wget as 00:00 , but rather should that be a Europe local 06:00. A couple of hours later *I* am going to be asked this: we can see from the log files, that wget only retrieved the files at Europe local 07:00, although they apparently already were to ready to get picked up at 00:00. why is that? I wouldn't mind making wget believe (maybe through setting the environment variable TZ) it actually lives in New-England, although it lives here in Europe. Would that be a reasonable approach or rather nonsense? JH
Re: The patch list
Hrvoje Niksic wrote: I'm curious... is anyone using the patch list to track development? I'm posting all my changes to that list, and sometimes it feels a lot like talking to myself. :-) I read the introductory stuff to see what's changed, but I never extract the patches from the messages. From my perspective, the introductory stuff plus a list of affected files would be sufficient. Tony
Re: Time Stamping and Daylight Savings Time
At 07:24 PM 11/4/2003, Hrvoje Niksic wrote: It continues to amaze me how many people use Wget on Windows. Anyway, thanks for the detailed bug report. I would love to learn linux and a whole bunch of computer stuff, but there are only so many hours in a day. I'm not an IT guy, just a worker that has to learn the computer for himself and figure the most efficient way to get stuff done, where efficiency includes cost of capital and learning curves as well. Many thanks to all who contribute for a very fine product. I had messed with a couple of gui sitesnag programs and found them lacking, and asked for a better recommendation on a local discussion list (WAMU ComputerGuys). A gal by the name of Vicky Staubly recommended WGET, and the rest, as they say, is history. v/r Fred Holmes
Re: Time Stamping and Daylight Savings Time
At 07:24 PM 11/4/2003, Hrvoje Niksic wrote: Until then, if old files really never change, could you simply use `-nc'? Yes, that will do it quite nicely. I missed that one. I'll try it tomorrow, but a simple condition like that should work well. Thanks for your help. Fred Holmes