On Mon, Mar 17, 2008 at 4:41 PM, Micah Cowan <[EMAIL PROTECTED]> wrote:
>  Is that true? I thought wget actually read the input file in a streaming
>  fashion.

If that is the case, then I think it's possible to add links to the
list while wget has already running.

>  I don't expect that a single session's database would get frequent
>  reuse, though. However, it probably _would_ be used repeatedly while
>  you're working on a specific session; in that case, it's useful to have
>  the binary format.

A session database! :D So I have misunderstood this database thing. I
thougt it is something like a central repository in the user's home
(like .wget-history) that records all the links that have been
downloaded with all its meta-information. Maybe a better name is a
project file, or a session file, but calling it a database would have
been too much ... :D. For a session information, an ini file is
sufficient IMO.

>  However, it's important to be able to parse the file, even if there is
>  some corruption or malformed information in some places--and especially,
>  if it is truncated (Wget abruptly killed).

YAML is safe for this I think. The libyaml implements a YAML scanner.
If the scanner failed at a point in the session file, we can consider
all points forwards as invalid. And since YAML is composed of
line-per-line information, the worst we will get is missing a line of
information, instead of losing all the information in the file.

To prevent losing data, wget has to frequently write to the session
information, but frequent writing will burden the harddisk. I wonder
if memory mapped file can help with this. From
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2044.html, it
says that memory mapped file has feature of "Automatic file data
synchronization and cache from the OS". If wget process is suddenly
killed, the task of synchronizing memory and disk content will be done
by the OS, CMIIW, so we won't lose any data.

>  Still, I imagine the problem is easily fixed by placing some line at the
>  end of the file to indicate completion.

Wget completion timestamp would fit it.

Considering libyaml stability. Even though it's alpha quality software
at version 0.0.1 it has already distributed with its stable
counterpart pyyaml (which is implemented in Python [1]) so I think it
is usable. At the time this session database feature of wget gets
impelemented, libyaml could have reach its production release, so both
can run together will, I guess.

[1] The binary distribution of PyYAML includes libyaml which can be
used as the faster alternative parser.

Reply via email to