Re: [Bug-wget] [PATCH] New option: --rename-output: modify output filename with perl

Micah Cowan Mon, 29 Jul 2013 11:00:05 -0700

On Fri, Jul 26, 2013 at 02:30:00PM -0400, Andrew Cady wrote:
> Incidentally, the former maintainer of wget, Micah Cowan, actually
> started working on a wget "competitor" (so to speak) based on a plugin
> architecture designed around this concept:


Thanks for the mention. :)

Not plugins; it's based on building the entire application as a big,
easily-customizable shell-style pipeline. So every logical module in the
code is a distinct program. The design was targeted specifically at
being able to do the sorts of things Tim mentioned - saving metadata to
a database, handling different content types in different ways, parsing
links out of JavaScript or PDF files, etc. Niwt currently does none of
those things, I think. :)

The analogous Niwt feature to Wget's proposed --rename-output is
--transform-name, which accepts arbitrary shell commands to pipe the
name into for transformation (so, you could use perl, or just sed...)

In some cases, chained execution is used instead of pipelining; this
proved to be a convenient way to write HTTP header filters: translate
them into CGI-style environment variables, and have the filters modify
headers by changing the environment and executing the next filter in the
chain.

>   
> http://micah.cowan.name/2011/02/13/computers/software-development/announcement-niwt-nifty-integrated-web-tools/
> 
>   http://niwt.addictivecode.org/
> 
> I haven't really looked into it -- in fact I didn't know it was actually
> released until I searched for it just now (I just remembered Micah
> saying he was going to work on it).  At a glance, it looks to be very
> flexible, but also very incomplete.

It wasn't released. If you're looking at the 0.1 tarballs, those are
extremely early versions. The Mercurial sources (at the bottom of
http://niwt.addictivecode.org/InstallingNiwt ) are more recent. 
The latest sources in Mercurial are still more than a year old, but it
does a lot of the things I wanted it to do. One thing it still doesn't
do, is recursive fetching, which is obviously a big feature of Wget's.
Recursion in a pipeline is tricky, but I have the designs done for it,
just not implementation yet (Niwt already has some features requiring
recursive logic, such as HTTP redirects).

The engine it's based on - Plines - to manage the shell-like pipeline,
is written entirely in sh, which presents performance problems. I chose
sh because it was most convenient for protyping this stuff in, but the
time has come to write a more streamlined version. I want to do that
before I implement recursive fetching. I'm also considering rewriting it
so that the "pipelines" can use internal modules as well as external
programs, so that the default cases can be much more efficient, while
still allowing every single bit of logic to be hot-swappable with
custom commands.

> Niwt apparently uses "an HTTP-based protocol" to communicate between
> plugins.

Yeah; basically HTTP plus extra headers to communicate information down
the pipeline, and mandatory "chunked" transfer-encoding so that the pipe
doesn't have to be terminated between messages.

-mjc

Re: [Bug-wget] [PATCH] New option: --rename-output: modify output filename with perl

Reply via email to