Re: Batch files in DOS

2006-06-05 Thread Tobias Tiederle
Hi,

[EMAIL PROTECTED] wrote:
> I'm trying to mirror about 100 servers (small fanfic sites) using
> wget --recursive --level=inf -Dblah.com, blah.com,blah.com some_address
> However, when I run the batch file, it stops reading after a while;
> apparently my command has too many characters.  Is there some other
> way I should be doing this, or a workaround?
You can put all the options for wget in a wgetrc file.
Set an environment variable called "WGETRC" which points to the full
pathname of your wgetrc file.
To see the options for wgetrc see
http://www.gnu.org/software/wget/manual/wget.html#Wgetrc-Commands

for your example the wgetrc file would read:
recursive=1
reclevel=inf
domain=blah.com,blah.com,blah.com

then start wget with:
wget some_address

TT


Re: Windows Title Bar

2006-04-18 Thread Tobias Tiederle

Hi,

I'm using start [1]. That way I can specify the title, have it running 
in the background and adjust priority and stuff. If you want to use it 
in a batch file you can specify /wait.

[Derek already got this, forgot to cc the list]

TT

[1] builtin command. Useable from cmd.exe or batch files.
START ["title"] [/Dpath] [/I] [/MIN] [/MAX] [/SEPARATE | /SHARED] [/LOW 
| /NORMAL | /HIGH | /REALTIME | /ABOVENORMAL | /BELOWNORMAL] [/WAIT] 
[/B] [command/program] [parameters]


Derek Parnell wrote:
I'd like to be able to exactly specify the title that appears on the 
console title bar (Windows environment of course). Current the application 
uses the URL that is being got but I'd like to specify it myself. Is there 
a way to do this now or does this have to be an enhancement?


Something like ...

  wget --title="News Server #1" http://www.etc.com/latest_news.html

So that "News Server #1" appears as the console title rather than the URL 
(or its possible redirect).


current wget crashes when using -c

2006-03-22 Thread Tobias Tiederle
Hi,

current trunk build crashes when trying to continue a download.
builds from tags WGET_1_10, WGET_1_10_1 and WGET_1_10_2 run correctly.

Build environment:
Windows XP
Visual Studio 2005
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42
for 80x86
OpenSSL 0.9.8a

Following output is produced:

---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---

F:\temp\>wget.exe --debug -vc
http://download.microsoft.com/download/2/4/3/243865fc-c896-497e-9a66-bcc3f596741e/directx_feb2006_redist.exe
Setting --verbose (verbose) to 1
Setting --continue (continue) to 1
DEBUG output created by Wget 1.10+devel on Windows-MSVC.

--15:24:15-- 
http://download.microsoft.com/download/2/4/3/243865fc-c896-497e-9a66-bcc3f596741e/directx_feb2006_redist.exe

F:\temp\>wget.exe --version
GNU Wget 1.10+devel

---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---

Regards Tobias


Re: Download all the necessary files and linked images

2006-03-09 Thread Tobias Tiederle
Hi,

Jean-Marc MOLINA schrieb:
> I have an other opinion about that limitation. Could it be considered as a
> bug ? From the "Types of Files" section of the manual we can read : « Note
> that these two options do not affect the downloading of html files; Wget
> must load all the htmls to know where to go at all-recursive retrieval would
> make no sense otherwise. ». It means the accept and reject options don't
> work on HTML files. But I think they should because, special in this case,
> you deliberately have to exclude them. Excluding them makes sense. So I
> don't really know what to do... Consider the problem as a bug, as a new
> feature to implement or as an existing feature that should be redesigned.
> It's pretty tricky.

I just set up my compile environment for WGet again.
When I did regex support, I had the same problem with exclusion, so I
introduced a new parameter "--follow-excluded-html".
(Which is of course the default) but you can turn it off with
--no-follow-excluded-html...

See attached patch for current trunk.

TT
Index: trunk/src/init.c
===
--- trunk/src/init.c(revision 2133)
+++ trunk/src/init.c(working copy)
@@ -146,6 +146,7 @@
 #endif
   { "excludedirectories", &opt.excludes,   cmd_directory_vector },
   { "excludedomains",  &opt.exclude_domains,   cmd_vector },
+  { "followexcluded", &opt.followexcluded, cmd_boolean },
   { "followftp",   &opt.follow_ftp,cmd_boolean },
   { "followtags",  &opt.follow_tags,   cmd_vector },
   { "forcehtml",   &opt.force_html,cmd_boolean },
@@ -277,6 +278,7 @@
 
   opt.cookies = true;
   opt.verbose = -1;
+  opt.followexcluded = 1;
   opt.ntry = 20;
   opt.reclevel = 5;
   opt.add_hostdir = true;
Index: trunk/src/main.c
===
--- trunk/src/main.c(revision 2133)
+++ trunk/src/main.c(working copy)
@@ -158,6 +158,7 @@
 { "exclude-directories", 'X', OPT_VALUE, "excludedirectories", -1 },
 { "exclude-domains", 0, OPT_VALUE, "excludedomains", -1 },
 { "execute", 'e', OPT__EXECUTE, NULL, required_argument },
+{ "follow-excluded-html", 0, OPT_BOOLEAN, "followexcluded", -1 },
 { "follow-ftp", 0, OPT_BOOLEAN, "followftp", -1 },
 { "follow-tags", 0, OPT_VALUE, "followtags", -1 },
 { "force-directories", 'x', OPT_BOOLEAN, "dirstruct", -1 },
@@ -611,6 +612,9 @@
   -X,  --exclude-directories=LIST  list of excluded directories.\n"),
 N_("\
   -np, --no-parent don't ascend to the parent directory.\n"),
+  N_("\
+   --follow-excluded-html  turns on downloading of excluded files 
for\n\
+   inspection (this is the default).\n"),
 "\n",
 
 N_("Mail bug reports and suggestions to <[EMAIL PROTECTED]>.\n")
Index: trunk/src/recur.c
===
--- trunk/src/recur.c   (revision 2133)
+++ trunk/src/recur.c   (working copy)
@@ -511,13 +511,14 @@
   && !(has_html_suffix_p (u->file)
   /* The exception only applies to non-leaf HTMLs (but -p
  always implies non-leaf because we can overstep the
- maximum depth to get the requisites): */
-  && (/* non-leaf */
+ maximum depth to get the requisites): 
+ No execption if the user specified no-follow-excluded */
+  && (opt.followexcluded && (/* non-leaf */
   opt.reclevel == INFINITE_RECURSION
   /* also non-leaf */
   || depth < opt.reclevel - 1
   /* -p, which implies non-leaf (see above) */
-  || opt.page_requisites)))
+  || opt.page_requisites
 {
   if (!acceptable (u->file))
{


Re: Get complete page?

2006-03-01 Thread Tobias Tiederle
Hi Juman,

first execute this command: wget --help
It is of utmost importance to read carefully the output of that command!

Then you might try:
wget --page-requisites --convert-links --span-hosts --html-extension
--no-directories --execute=robots=off [URL]
or
wget -pkHEnd --execute=robots=off [URL]

TT

juman schrieb:
> When using Mozilla or IE you can right-click on a page and choose "Save
> Page As..." and then select to save the complete page which creates a
> html file and a folder containing all the pictures for the page. The
> links in the page for the pictures is also rewritten to create a
> complete localized version of the page... Is there some smart way to do
> the same with wget?
>
> /juman


Re: How do I prevent wget from creating index.html?C=M;O=A ?

2005-11-07 Thread Tobias Tiederle

Evert Meulie schrieb:

Hi!

Thanks for the reply. Since I have no control over the server from 
which I'm pulling the mirror AND I do not want to live with these 
files ( 8-)  ), I was wondering whether there's a way to exclude 
certain file names, so that I can exclude the index.html?* wildcard...?

afaik there's no way (with official releases) to do this.
I have a regex patch for 1.9.1 lying around on my system but its not 
included in current wget releases (because it used pcre instead of gnu 
regex/c library regex).

Last thing I heard regex support is planned for 1.11.
(If you mirror this site often, why not use a script and delete them 
afterwards?)


Regards TT


Re: How do I prevent wget from creating index.html?C=M;O=A ?

2005-11-07 Thread Tobias Tiederle

Evert Meulie schrieb:
I'm using wget to mirror (part of) a site. This site contains a couple 
of directories which do not have a index.html in them, just a bunch of 
various files. When wget hits this dir, it creates:

index.html
index.html?C=M;O=A
index.html?C=M;O=D
index.html?C=N;O=A
index.html?C=N;O=D
index.html?C=S;O=A
index.html?C=S;O=D
It seems your server is configured to send a directory listing if no 
index.html is found.
By the looks of it the listing is sortable (Modified/Name/Size 
Ascending/Descending).



How do I prefer wget from doing so? I'm currently using the following:
wget -np -nH --cut-dirs=3 --mirror 
http://some.domain.com/folder/folder/folder/folder
If you want to get the files in this directory, I think you have to live 
with them.

Otherwise it should suffice to use --exclude to exclude the directory.

Regards TT


Re: regex in wget, it is dificult to implement?

2005-05-27 Thread Tobias Tiederle

Oliver Schulze L. schrieb:


Hi,
Would it be too dificult to implement this?
I'm thinking of passing an argument to a regex function that returns
true or false, and then deside to download the file. Any points to where
to look in the code?


Yes, I know where to look at, I did a regex patch for 1.9.1+cvs
I'm currently not at home, but I could post an updated diff for current 
CVS version on monday morning.


Regards TT



Re: NTLM authentication in CVS

2005-04-07 Thread Tobias Tiederle
Herold Heiko schrieb:

>3) As expected msvc still throws compiler error on http.c and retr.c, (bad)
>workaround: disable optimization. Anybody with a cl.exe newer than
>Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8804 for 80x86
>can comment if this is needed with newer versions, too ?
>
I'm using
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.00.9466 for 80x86

the only noteable output while compiling (your other two patches
applied) is:
\Vc7\PlatformSDK\Include\WinSock.h(689) : warning C4005: 'NO_ADDRESS' :
macro redefinition
host.c(59) : see previous definition of 'NO_ADDRESS'
http.c(514) : warning C4090: 'function' : different 'const' qualifiers
http.c(532) : warning C4090: 'function' : different 'const' qualifiers
http.c(710) : warning C4090: 'function' : different 'const' qualifiers

Tobias


wget regex patch

2005-04-06 Thread Tobias Tiederle
Hello,

after reading so much about regex support for wget (espacially the lack
of it) and experiencing myself how annoying it can be if you have
downloaded a hundred /thumbs/ directories, I tried to implement regex
support myself.
I used pcre library from http://www.pcre.org which was pretty easy to
use, given the fact that I never ever touched a single line of C (or
C++) code before.
Unfortunately I don't know jack about autoconf, makefiles etc.
The patch in its current form is only useful with MSVC as I didn't alter
any other makefiles.
I hope someone can do that for me and include the pcre license from
http://www.pcre.org/license.txt

As you can see pcre.h and pcre.lib need to be somwhere the compiler can
find them and HAVE_REGEX needs to be defined.
Files and directories are ignored if the regex given on the command line
match. For Syntax see wget --help.
The patch was made against current cvs code.
Hope this helps somehow.

Tobias
diff -ruwb wget-regex2/src/ftp.c wget-regex3/src/ftp.c
--- wget-regex2/src/ftp.c   Sat Apr 02 02:41:04 2005
+++ wget-regex3/src/ftp.c   Wed Apr 06 18:55:24 2005
@@ -1749,7 +1749,11 @@
 return res;
   /* First: weed out that do not conform the global rules given in
  opt.accepts and opt.rejects.  */
+#ifdef HAVE_REGEX 
+  if (opt.accepts || opt.rejects || opt.exclregfile)
+#else
   if (opt.accepts || opt.rejects)
+#endif /* HAVE_REGEX */
 {
   f = start;
   while (f)
diff -ruwb wget-regex2/src/init.c wget-regex3/src/init.c
--- wget-regex2/src/init.c  Sun Mar 20 17:07:38 2005
+++ wget-regex3/src/init.c  Wed Apr 06 19:37:13 2005
@@ -137,6 +137,10 @@
 #endif
   { "excludedirectories", &opt.excludes,   cmd_directory_vector },
   { "excludedomains",  &opt.exclude_domains,   cmd_vector },
+#ifdef HAVE_REGEX  
+  { "excluderegexdir", &opt.exclregdir,cmd_string },
+  { "excluderegexfile", &opt.exclregfile,  cmd_string },
+#endif /* HAVE_REGEX */
   { "followftp",   &opt.follow_ftp,cmd_boolean },
   { "followtags",  &opt.follow_tags,   cmd_vector },
   { "forcehtml",   &opt.force_html,cmd_boolean },
@@ -1367,6 +1371,12 @@
   xfree_null (opt.sslcertkey);
   xfree_null (opt.sslcertfile);
 #endif /* HAVE_SSL */
+#ifdef HAVE_REGEX
+  xfree_null (opt.exclregdir_c)
+  xfree_null (opt.exclregfile_c)
+  xfree_null (opt.exclregdir);
+  xfree_null (opt.exclregfile);
+#endif /* HAVE_REGEX */
   xfree_null (opt.bind_address);
   xfree_null (opt.cookies_input);
   xfree_null (opt.cookies_output);
diff -ruwb wget-regex2/src/main.c wget-regex3/src/main.c
--- wget-regex2/src/main.c  Tue Mar 22 15:20:02 2005
+++ wget-regex3/src/main.c  Wed Apr 06 19:03:56 2005
@@ -68,6 +68,10 @@
 /* On GNU system this will include system-wide getopt.h. */
 #include "getopt.h"
 
+#ifdef HAVE_REGEX
+#include 
+#endif /* HAVE_REGEX */
+
 #ifndef PATH_SEPARATOR
 # define PATH_SEPARATOR '/'
 #endif
@@ -176,6 +180,10 @@
 { "egd-file", 0, OPT_VALUE, "egdfile", -1 },
 { "exclude-directories", 'X', OPT_VALUE, "excludedirectories", -1 },
 { "exclude-domains", 0, OPT_VALUE, "excludedomains", -1 },
+#ifdef HAVE_REGEX
+{ "exclude-regex-dirs", 0, OPT_VALUE, "excluderegexdir", -1 },
+{ "exclude-regex-files", 0, OPT_VALUE, "excluderegexfile", -1 },
+#endif
 { "execute", 'e', OPT__EXECUTE, NULL, required_argument },
 { "follow-ftp", 0, OPT_BOOLEAN, "followftp", -1 },
 { "follow-tags", 0, OPT_VALUE, "followtags", -1 },
@@ -591,6 +599,12 @@
   -D,  --domains=LIST  comma-separated list of accepted 
domains.\n"),
 N_("\
--exclude-domains=LIST  comma-separated list of rejected 
domains.\n"),
+#ifdef HAVE_REGEX  
+N_("\
+   --exclude-regex-dirs=PATTERN   pattern of directories to reject.\n"),
+   N_("\
+   --exclude-regex-files=PATTERN  pattern of files to reject.\n"),
+#endif /* HAVE_REGEX */
 N_("\
--follow-ftpfollow FTP links from HTML documents.\n"),
 N_("\
@@ -647,6 +661,7 @@
   int i, ret, longindex;
   int nurl, status;
   int append_to_log = 0;
+  const char *error;  
 
   i18n_initialize ();
 
@@ -819,6 +834,40 @@
   exit (1);
 }
 #endif
+
+#ifdef HAVE_REGEX
+  if (opt.exclregdir)
+{  
+  opt.exclregdir_c = pcre_compile(
+opt.exclregdir,   /* the pattern */
+0,/* default options */
+&error,   /* for error message */
+&i,   /* for error offset */
+NULL);/* use default character tables */   
+  
+  if (opt.exclregdir_c == NULL)
+{  
+  printf (_("Directory RegEx compilation failed at offset %d: %s\n"), 
i, error);
+  exit (1);
+}
+}
+
+if (opt.exclregfile)
+{  
+  opt.exclregfile_c = pcre_compile(
+opt.exclregfile,   /* the pattern */
+0,/* default options */
+&error,