On 10/21/2010 12:06 PM, Martin Langhoff wrote: > Unfortunately, there is a clear need to organise a facility to > audit/edit the wikipedia snapshots we have and "repack" the archive. > > Do we have any easy way to do this?
I'm the wrong person to answer this question, but the activity's archive production system does already have support for an article blacklist (and indeed many articles were excluded from the current bundles). I don't know who is in possession of this list, or exactly who took responsibility for producing the most recent version. Nonetheless, excluding articles is "easy". Actually editing article text is not something we have attempted AFAIK. Ideally, I think, we would fix textual problems upstream as they are discovered. The most recent available snapshots for English and Spanish are 10-14 days old, so this strategy does create a delay, during which time things can continue to change. In general, I believe that auditing wikipedia is a fool's errand. There are 3.5 million articles in English Wikipedia, growing by over a thousand a day. Spanish wikipedia has >650,000 articles. If people want to create snapshots containing only whitelisted articles, that's fine, but many of the links will be broken and the amount of information will be much reduced. --Ben
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel