Re: Bug report: 1) Small error 2) Improvement to Manual

Ian Abbott Mon, 21 Jan 2002 10:21:59 -0800

On 21 Jan 2002 at 14:56, Thomas Lussnig wrote:

> >Why not just open the wgetrc file in text mode using
> >fopen(name, "r") instead of "rb"? Does that introduce other
> >problems?
> I think it has to do with comments because the defeinition is that 
> starting with '#'  the rest of the line
> is ignored. And an line ends with '\n' or the end of the file and not 
> with and spezial charakter '\0' that
> mean for me that to abort the reading of an textfile when zero isfound 
> mean's incorrect parsing.


(N.B. the control-Z character would be '\032', not '\0'.)

So maybe just mention in the documentation that the wgetrc file is
considered to be a plain text file, whatever that means for the
system Wget is running on. Maybe mention peculiaries of
DOS/Windows, etc.

In general, it is more portable to read or write native text files
in text mode as it performs whatever local conversions are
necessary to make reads and writes of text files appear like UNIX
i.e. each line of text terminated by a newline '\n'). In binary
mode, what you get depends on the system (Mac text files have lines
terminated by carriage return ('\r') for example, and some systems
(VMS?) don't even have line termination characters as such.)

In the case of Wget, log files are already written in text mode. I
think wgetrc needs to be read in text mode and that's an easy
change.

In the case of the --input-file option, ideally the input file
should be read in text mode unless the --force-html option is used,
in which case it should be read in the same mode as when parsing
other locally-stored HTML files.

Wget stores retrieved files in binary mode but the mode used when
reading those locally-stored files is less precise (not that it
makes much difference for UNIX). It uses open() (not fopen()) and
read() to read those files into memory (or uses mmap() to map them
into memory space if supported). The DOS/Windows version of open()
allows you to specify text or binary mode, defaulting to text mode,
so it looks like the Windows version of Wget saves html files in
binary mode and reads them back in in text mode! Well whatever -
the HTML parser still seems to work okay on Windows, probably
because HTML isn't that fussy about line-endings anyway!

So to support --input-file portably (not the --force-html version),
the get_urls_file() function in url.c should probably call a new
function read_file_text() (or read_text_file() instead of
read_file() as it does at the moment. For UNIX-type systems, that
could just fall back to calling read_file().

The local HTML file parsing stuff should probably be left well
alone but possibly add some #ifdef code for Windows to open the
file in binary mode, though there may be differences between
compilers for that.

Re: Bug report: 1) Small error 2) Improvement to Manual

Reply via email to