Re: Bug using recursive get and stdout

2007-04-17 Thread Steven M. Schweda
   A quick search at "http://www.mail-archive.com/wget@sunsite.dk/"; for
"-O" found:

  http://www.mail-archive.com/wget@sunsite.dk/msg08746.html
  http://www.mail-archive.com/wget@sunsite.dk/msg08748.html

   The way "-O" is implemented, there are all kinds of things which are
incompatible with it, "-r" among them.



   Steven M. Schweda   [EMAIL PROTECTED]
   382 South Warwick Street(+1) 651-699-9818
   Saint Paul  MN  55105-2547


Bug using recursive get and stdout

2007-04-17 Thread Jonathan A. Zdziarski

Greetings,

Stumbled across a bug yesterday reproduced in both v1.8.2 and 1.10.2.

Apparently, recursive get tries to open the file for reading after  
downloading, to download subsequent files. Problem is, when used with  
-O - to deliver to stdout, it cannot open that file, so you get the  
output below (note the "No such file or directory error"). In 1.10,  
it appears that they removed this error message, but wget still fails  
to recursively fetch.


I realize it seems like there wouldn't be much reason to send more  
than one page to stdout, but I'm feeding it all into a statistical  
filter to classify website data, so it doesn't really matter to the  
filter. Do you know of any workaround for this, other than opening  
the files after reading (won't scale with thousands per minute).


Thanks!

$ wget -O - -r http://www.zdziarski.com > out
--15:40:06--  http://www.zdziarski.com/
   => `-'
Resolving www.zdziarski.com... done.
Connecting to www.zdziarski.com[209.51.159.242]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24,275 [text/html]

100%[>] 24,275   163.49K/s 
ETA 00:00


15:40:06 (163.49 KB/s) - `-' saved [24275/24275]

www.zdziarski.com/index.html: No such file or directory

FINISHED --15:40:06--
Downloaded: 24,275 bytes in 1 files





Jonathan