Created a static mirror of my blog using wget to download the files. There was one wget feature I had to work around. When wget is downloading recursively and finds a URL that ends in a slash, for example http://example.com/html/, wget will create a directory locally and save the contents of the given url to a file called index.html. This worked fine for me in every case but one. My rss feed of my blog. I'm using Wordpress 1.5 to generate my blog, and by default Wordpress makes the rss feed url like http://example.com/feed/. And, then they usually have apache configured to search for index.xml as well as index.html, apache finds index.xml and sends it down to wget. Wget creates a directory and stores apache's resulting web page in that directory in a file called index.xml.

And note, I believe because I'm using the wget convert-links option, the page that linked to the rss feed now has a link to http://example.com/feed/index.html.

I think wget's current behavior would work fine, the only problem being my blog feed would now have an html suffix instead of an xml suffix. I *think* the news aggregators would be fine with this issue. However, it would confuse humans as html is usually a human viewable format, and xml suffixes for computer programs.

This could possibly be changed by reading the Content-Type header that come back from the apache server when requesting this xml file. I don't know if it's true in every circumstance, but I did check the headers that Wordpress generates, and a text/xml content type is returned from apache. Could wget, for url's that end in slashes, read the content-type header, and if it's text/xml, could wget create index.xml inside the directory wget creates?

I've already got a work around that works 95% for me in my circumstances, and 100% for the public viewers of the static mirror of my blog. I basically change the link of the rss feed inside Wordpress so that it's the same as what the link should look like after the blog is already mirrored. Then, when wget finds that link to the rss feed in one of the web pages its downloaded, it ignores the rss feed link because the mirror is on a different machine than the blog itself. I have to run wget a second time to put the rss feed xml file in the right place in the mirror I downloaded with wget. Then, I upload all the mirror files via ftp to the public web server. Only problem is I can't test things with the rss feed on my local machine. Have to go into wordpress and change the rss feed link every time I want to test something locally.

Please CC me with all responses, I'm not subscribed to this mailing list.

Thanks.

--
Levander's Yabbering!
http://home.mindspring.com/~levander



Reply via email to