Daniel Barrett wrote:
Well, a script doesn't need human-readability. :-) Trust me, this is
not hard. I did it a few years ago with minimal difficulty (using a
couple of Emacs macros, if memory serves).
If you recall, the decision is that a novice has volunteered to take
over as a way to learn
On 1/8/2014 10:10 AM, Richard Pieri wrote:
Daniel Barrett wrote:
Well, a script doesn't need human-readability. :-) Trust me, this is
not hard. I did it a few years ago with minimal difficulty (using a
couple of Emacs macros, if memory serves).
If you recall, the decision is that a novice has
Bill Horne wrote:
the result of mirroring a site would be a lot of separate html files,
one for each link on the site. Is this not correct?
You'll get a lot of separate files, yes. What's in those files is
something you need to see for yourself.
--
Rich P.
I need to copy the contents of a wiki into static pages, so please
recommend a good web-crawler that can download an existing site into
static content pages. It needs to run on Debian 6.0.
Bill
--
Bill Horne
339-364-8487
___
Discuss mailing list
Bill Horne wrote:
I need to copy the contents of a wiki into static pages, so please
recommend a good web-crawler that can download an existing site into
static content pages. It needs to run on Debian 6.0.
Remember that I wrote how wikis have a spate of problems? This is the
biggest one.
On 1/7/2014 6:49 PM, Bill Horne wrote:
I need to copy the contents of a wiki into static pages, so please
recommend a good web-crawler that can download an existing site into
static content pages. It needs to run on Debian 6.0.
wget -k -m -np http://mysite
is what I used to use. -k
On 1/7/2014 7:28 PM, Matthew Gillen wrote:
On 1/7/2014 6:49 PM, Bill Horne wrote:
I need to copy the contents of a wiki into static pages, so please
recommend a good web-crawler that can download an existing site into
static content pages. It needs to run on Debian 6.0.
wget -k -m -np
Matthew Gillen wrote:
wget -k -m -np http://mysite
I've tried this. It's messy at best. Wiki pages aren't static HTML.
They're dynamically generated and they come with all sorts of style
sheets and embedded scripts. Yes, you can get the text but it'll be text
as rendered by a wiki. It
Daniel Barrett wrote:
For instance, you can write a simple script to hit Special:AllPages
(which links to every article on the wiki), and dump each page to HTML
with curl or wget. (Special:AllPages displays only N links at a time,
Yes, but that's not humanly-readable. It's a dynamically
Matthew Gillen wrote:
wget -k -m -np http://mysite
I create an emergency backup static version of dynamic sites using:
wget -q -N -r -l inf -p -k --adjust-extension http://mysite
The option -m is equivalent to -r -N -l inf --no-remove-listing, but
I didn't want --no-remove-listing (I don't
Hi Bill,
GPL - licensed HTTrack Website Copier works well (http://www.httrack.com/).
I have not tried it on a MediaWiki site, but it's pretty adept at copying
websites including dynamically generated websites.
They say: It allows you to download a World Wide Web site from the
Internet to a
Also, I just discovered a MediaWiki extension written by Tim Starling that
may suit your needs. As the name implies, its for dumping to HTML.
http://www.mediawiki.org/wiki/Extension:DumpHTML
As for processing the XML produced by export or MediaWiki dump tools,
here is info on that XML schema
Plus one for HTTrack. I used it a couple of months ago to convert a
terrible Joomla hacked site to HTML. It was a pain to use at first,
like having to use Firefox, but it worked as advertised.
Hope that helps.
On Tue, Jan 7, 2014 at 10:34 PM, Greg Rundlett (freephile)
g...@freephile.com wrote:
13 matches
Mail list logo