I think I would be very interested in 3, or even, in having every month a dump of that month's revisions. As I have built tools for the xml dumps, no change in format is good for me (and for WikiTrust).
I would find incremental dumps (with occasional, yearly, full dumps) much easier to manage than full dumps. Luca On Thu, Mar 31, 2011 at 2:27 PM, Yuvi Panda <yuvipa...@gmail.com> wrote: > Hi, I'm a student planning on doing GSoC this year on mediawiki. > Specifically, I'd like to work on data dumps. > > I'm writing this to gauge what would be useful to the research > community. Several ideas thrown about include: > 1. JSON Dumps > 2. Sqlite Dumps > 3. Daily dumps of revisions in last 24 hours > 4. Dumps optimized for very fast import into various external storage > and smaller size (diffs) > 5. JSON/CSV for Special:Import and Special:Export > > Would any of these be useful? Or is there anything else that I'm > missing, that you would consider much more useful? > > Feedback would be invaluable :) > > Thanks :) > -- > Yuvi Panda T > http://yuvi.in/blog > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l