On 16/12/10 23:10, Joseph Reagle wrote: > On Wednesday, December 15, 2010, Tim Starling wrote: >> There were some changes made to the page text that weren't represented >> in diff_log, specifically changing certain camel-case links to free >> links. > It appears my problems were related to some CR/LF issues not round-tripping > between diff and patch, but I hope to be able to address that. And yes, in > addition to some of the CamelCase issues, I expect another problem is that if > a page is blanked "Describe the new page here." will reappear outside of the > diff_log.
I don't think that will be a problem. But there are other problems that I've encountered. UseMod had a deletion feature. It turns out to be easy enough to skip deleted pages, since they don't have a corresponding entry in rclog. It also had an admin-only rename feature, which optionally fixed links in all pages. This accounts for the free link changes I was seeing earlier. And it had a link replacement feature which could be invoked without a page move. These features were rarely used, due to the arcane interface, usually people just moved pages by copying and pasting. But during the free-link conversion, a lot of pages were renamed using the admin-only feature. All these admin-only features were unlogged, but it turns out to be possible to reconstruct page moves, because when a page was moved, its name was updated in rclog but not in diff_log. By finding the first diff_log entry with the new name, you can roughly work out when the page moves were done. Anyway, I'm developing a script which will import the dump into a modified MediaWiki instance, the idea being that I can then export XML from it. Once it works, I'll upload the XML to somewhere. I'm not sure when that will be. -- Tim Starling _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l