As an addendum I'd like to say that I plan on having the feed available on 
the toolserver by the end of this week. The feed will be produced by copying 
stuff using the external link table via a cronjob.

From: Kevin Brown
Sent: Monday, September 19, 2011 1:02 PM
To: wikitech-l@lists.wikimedia.org
Subject: Status Update on Archive Links Extension

ArchiveLinks was created as a GSoC project to address the problem of linkrot 
on Wikipedia. In articles we often cite or link to external URLs, but 
anything could happen to content on
other sites -- if they move, change, or simply vanish, the value of the 
citation is lost. ArchiveLinks rewrites external links in Wikipedia 
articles, so there is a '[cached]' link immediately afterwards which points 
to the web archiving service of your choice. This can even preserve the 
exact time that the link was added, so for sites which archive multiple 
versions of content (such as the Internet Archive) it will even link to a 
copy of the page that was made around the time the article was written.

Next, ArchiveLinks also publishes a feed via the API of recently added 
external links, so your favorite remote service can crawl those in a timely 
fashion. We have been talking with the Internet Archive about this; they are 
eager to get a list of the recent external links from Wikipedia since they 
believe our community will probably be linking to some of the most important 
and useful content on the web.

ArchiveLinks also contains a simple spidering system if you want to cache 
the links yourself, and display them through MediaWiki.

We completed almost all of our planned features 
(https://secure.wikimedia.org/wikipedia/mediawiki/wiki/User:Kevin_Brown/ArchiveLinks/UserStories)
 
and the next step is to campaign to get this adopted on Wikipedia. A lot of 
people are enthusiastic about the concept, but it is likely we will get more 
input on exactly what the "cached" link should look like, and it will take 
some time to get a security review. At the same time, we are working with 
the Internet Archive to set up a test site for them to crawl the feed 
(perhaps from Toolserver, before it is deployed on Wikipedia). Once the feed 
is setup on the toolserver the Internet Archive will start archiving all 
links that appear on the feed. This will effectively leave producing the 
cached link on the deployed version of mediawiki as the last step to fixing 
linkrot in all places where it is possible.


(Thanks to Neil Kandalgaonkar for writing the majority of this email). 


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to