On Fri, Jul 26, 2013 at 02:30:00PM -0400, Andrew Cady wrote: > Incidentally, the former maintainer of wget, Micah Cowan, actually > started working on a wget "competitor" (so to speak) based on a plugin > architecture designed around this concept:
Thanks for the mention. :) Not plugins; it's based on building the entire application as a big, easily-customizable shell-style pipeline. So every logical module in the code is a distinct program. The design was targeted specifically at being able to do the sorts of things Tim mentioned - saving metadata to a database, handling different content types in different ways, parsing links out of JavaScript or PDF files, etc. Niwt currently does none of those things, I think. :) The analogous Niwt feature to Wget's proposed --rename-output is --transform-name, which accepts arbitrary shell commands to pipe the name into for transformation (so, you could use perl, or just sed...) In some cases, chained execution is used instead of pipelining; this proved to be a convenient way to write HTTP header filters: translate them into CGI-style environment variables, and have the filters modify headers by changing the environment and executing the next filter in the chain. > > http://micah.cowan.name/2011/02/13/computers/software-development/announcement-niwt-nifty-integrated-web-tools/ > > http://niwt.addictivecode.org/ > > I haven't really looked into it -- in fact I didn't know it was actually > released until I searched for it just now (I just remembered Micah > saying he was going to work on it). At a glance, it looks to be very > flexible, but also very incomplete. It wasn't released. If you're looking at the 0.1 tarballs, those are extremely early versions. The Mercurial sources (at the bottom of http://niwt.addictivecode.org/InstallingNiwt ) are more recent. The latest sources in Mercurial are still more than a year old, but it does a lot of the things I wanted it to do. One thing it still doesn't do, is recursive fetching, which is obviously a big feature of Wget's. Recursion in a pipeline is tricky, but I have the designs done for it, just not implementation yet (Niwt already has some features requiring recursive logic, such as HTTP redirects). The engine it's based on - Plines - to manage the shell-like pipeline, is written entirely in sh, which presents performance problems. I chose sh because it was most convenient for protyping this stuff in, but the time has come to write a more streamlined version. I want to do that before I implement recursive fetching. I'm also considering rewriting it so that the "pipelines" can use internal modules as well as external programs, so that the default cases can be much more efficient, while still allowing every single bit of logic to be hot-swappable with custom commands. > Niwt apparently uses "an HTTP-based protocol" to communicate between > plugins. Yeah; basically HTTP plus extra headers to communicate information down the pipeline, and mandatory "chunked" transfer-encoding so that the pipe doesn't have to be terminated between messages. -mjc