wget and timestamping
I am using wget 1.8.2-15 on a LINUX system. I use it to get data files from old VMS systems that we still use to take data. The problem I am having seems to be this: VMS has two dates associated with a file: the creation date, which is generated when the file is made, and is never changed after that, and the modification date, which is updated whenever the file is modified. When I use wget with timestamping (wget -N) and poll the VMS system for a file, the date returned seems to be the creation date, not the modification date. A user sets up a run file on Monday afternoon, which is not started till Tuesday Morning. This file is grabbed by wget on Monday Evening, before it is started. The run file is started Tuesday Morning, and data is stored in it. But when wget runs Tuesday evening the file is not copied again because the creation date has not changed. The local and remote files have different sizes, which I thought (after reading the man page) should flag wget to grab it. But it does not. Can anyone shed some light on what is going on, and how to remedy this problem?
Report to Sender
Incident Information:- Database: d:/notes/data/mail.box Originator: [EMAIL PROTECTED] Recipients: [EMAIL PROTECTED] Subject:Details Date/Time: 03/15/2004 09:37:26 AM The file attachment ycbtps.zip you sent to the recipients listed above was infected with the W32/Mydoom.f!zip virus and was successfully cleaned.
RE: Wget - relative links within a script call aren't followed
This has been discusses several times in the past, for a complete solution LOT of work would be needed (a complete javascript engine would be neccessary for a starter), also there are several semantic problems (for example if a pic is laded only during mouseover, without preload, we still would not get it, since there is no mouse). Possibly some very partial, incomplete solution would be possible but frankly that would be an ugly hack. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax > -Original Message- > From: Fred Holmes [mailto:[EMAIL PROTECTED] > Sent: Monday, March 15, 2004 3:09 PM > To: Herold Heiko; 'Raydeen A. Gallogly'; [EMAIL PROTECTED] > Subject: RE: Wget - relative links within a script call > aren't followed > > > It surely would be nice if some day WGET could support > javascript. Is that something to put on the "wish list" or > is it substantially impossible to implement? Do folks use > Java to load images in order to thwart 'bots such as WGET? > > I run into the same problem regularly, and simply create a > series of lines in a batch file that download each of the > images by explicit filename. Very doable, but requires > manual setup, rather than having WGET automatically follow > the links. This will test for/download files that are known > to ought to be there, but won't find files that are newly added. > > Thanks, > > Fred Holmes > > At 05:07 AM 3/15/2004, Herold Heiko wrote: > >No way, sorry. > >wget does not support javascript, so there is no way to have > it follow that > >kind of links. > >Heiko > > > >-- > >-- PREVINET S.p.A. www.previnet.it > >-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] > >-- +39-041-5907073 ph > >-- +39-041-5907472 fax > > > >> -Original Message- > >> From: Raydeen A. Gallogly [mailto:[EMAIL PROTECTED] > >> Sent: Friday, March 12, 2004 4:20 PM > >> To: [EMAIL PROTECTED] > >> Subject: Wget - relative links within a script call aren't followed > >> > >> > >> I'm new to Wget but have learned alot in the last week. We are > >> successfully running Wget to mirror a website existing on the > >> other side of > >> a firewall within our own agency. We can retrieve all > >> relative links from > >> existing HTML files with the exception of those that are > >> contained within a > >> script. > >> > >> For example, this is an excerpt from a script call to load an > >> image within > >> an HTML document that is not being followed: > >> MM_preloadImages('pix/lats_but_lite.gif',) > >> > >> The only fix to this problem so far that we have been able to > >> implement is > >> to have the webmaster on the site that we want to mirror > >> create a small > >> HTML file named 'wgetfixes.html', link to it from the home > >> page using style > >> (display:none;) so that users won't see. Within the file, > >> list all the > >> files that they are calling from within their scripts > >> individually using > >> the following syntax: -- > >> this works fine > >> but I'm hopeful that there is a better way using a switch > within Wget. > >> > >> Thanks for any input, it is truly appreciated. - Raydeen > >> > >> .. > >> > >> > >> Raydeen Gallogly > >> Web Manager > >> NYS Department of Health, Wadsworth Center > >> http://www.wadsworth.org > >> email: [EMAIL PROTECTED] > >> > >> > >> > >> > >> > >> > >> > >> >
RE: Wget - relative links within a script call aren't followed
It surely would be nice if some day WGET could support javascript. Is that something to put on the "wish list" or is it substantially impossible to implement? Do folks use Java to load images in order to thwart 'bots such as WGET? I run into the same problem regularly, and simply create a series of lines in a batch file that download each of the images by explicit filename. Very doable, but requires manual setup, rather than having WGET automatically follow the links. This will test for/download files that are known to ought to be there, but won't find files that are newly added. Thanks, Fred Holmes At 05:07 AM 3/15/2004, Herold Heiko wrote: >No way, sorry. >wget does not support javascript, so there is no way to have it follow that >kind of links. >Heiko > >-- >-- PREVINET S.p.A. www.previnet.it >-- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] >-- +39-041-5907073 ph >-- +39-041-5907472 fax > >> -Original Message- >> From: Raydeen A. Gallogly [mailto:[EMAIL PROTECTED] >> Sent: Friday, March 12, 2004 4:20 PM >> To: [EMAIL PROTECTED] >> Subject: Wget - relative links within a script call aren't followed >> >> >> I'm new to Wget but have learned alot in the last week. We are >> successfully running Wget to mirror a website existing on the >> other side of >> a firewall within our own agency. We can retrieve all >> relative links from >> existing HTML files with the exception of those that are >> contained within a >> script. >> >> For example, this is an excerpt from a script call to load an >> image within >> an HTML document that is not being followed: >> MM_preloadImages('pix/lats_but_lite.gif',) >> >> The only fix to this problem so far that we have been able to >> implement is >> to have the webmaster on the site that we want to mirror >> create a small >> HTML file named 'wgetfixes.html', link to it from the home >> page using style >> (display:none;) so that users won't see. Within the file, >> list all the >> files that they are calling from within their scripts >> individually using >> the following syntax: -- >> this works fine >> but I'm hopeful that there is a better way using a switch within Wget. >> >> Thanks for any input, it is truly appreciated. - Raydeen >> >> .. >> >> >> Raydeen Gallogly >> Web Manager >> NYS Department of Health, Wadsworth Center >> http://www.wadsworth.org >> email: [EMAIL PROTECTED] >> >> >> >> >> >> >> >>
RE: Wget - relative links within a script call aren't followed
No way, sorry. wget does not support javascript, so there is no way to have it follow that kind of links. Heiko -- -- PREVINET S.p.A. www.previnet.it -- Heiko Herold [EMAIL PROTECTED] [EMAIL PROTECTED] -- +39-041-5907073 ph -- +39-041-5907472 fax > -Original Message- > From: Raydeen A. Gallogly [mailto:[EMAIL PROTECTED] > Sent: Friday, March 12, 2004 4:20 PM > To: [EMAIL PROTECTED] > Subject: Wget - relative links within a script call aren't followed > > > I'm new to Wget but have learned alot in the last week. We are > successfully running Wget to mirror a website existing on the > other side of > a firewall within our own agency. We can retrieve all > relative links from > existing HTML files with the exception of those that are > contained within a > script. > > For example, this is an excerpt from a script call to load an > image within > an HTML document that is not being followed: > MM_preloadImages('pix/lats_but_lite.gif',) > > The only fix to this problem so far that we have been able to > implement is > to have the webmaster on the site that we want to mirror > create a small > HTML file named 'wgetfixes.html', link to it from the home > page using style > (display:none;) so that users won't see. Within the file, > list all the > files that they are calling from within their scripts > individually using > the following syntax: -- > this works fine > but I'm hopeful that there is a better way using a switch within Wget. > > Thanks for any input, it is truly appreciated. - Raydeen > > .. > > > Raydeen Gallogly > Web Manager > NYS Department of Health, Wadsworth Center > http://www.wadsworth.org > email: [EMAIL PROTECTED] > > > > > > > >