[WikiEN-l] exporting sets of pages
Hi, I wasn't sure whether this was the appropriate mailing list for this question - if not, pointers to the correct one would be appreciated. I would like to retrieve pages that contain, say, a DrugBox. The following URL lists all pages that contain this info box http://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Drugboxnamespace=0limit=5000hidetrans= What I'd like to do is then do a bulk export of these pages. As far as I can tell, the Export options require that one provide article titles. Furthermore, for some other infoboxes I have to page through the results. Instead I'd like to do this programmatically. The obvious solution would be to load Wikipedia into a local MySQL DB and then perform the queries directly. But I'm interested in a rather small subset of Wikipedia and loading the whole thing locally seems overkill. Is there a way I could export the articles containing Drugboxes or do I need to install Wikipedia locally? Thanks, -- Rajarshi Guha NIH Chemical Genomics Center ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] exporting sets of pages
On 05/01/11 10:03, Rajarshi Guha wrote: Is there a way I could export the articles containing Drugboxes or do I need to install Wikipedia locally? The best way to do it would be to get the list of articles using the API: http://www.mediawiki.org/wiki/API If that's too hard, you could could download templatelinks.sql.gz from http://download.wikimedia.org/enwiki/latest/ and load them into a MySQL database, and then use that to get the list of articles. But it's a big file and it's out of date. Either way, you should get a list of articles and then download them in small batches (say 10 articles at a time) using Special:Export. This may require a small amount of scripting. -- Tim Starling ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
[WikiEN-l] References bookmarklet?
http://davidgerard.co.uk/notes/2011/01/04/what-you-see-is-for-the-win/comment-page-1/#comment-13632 Someone suggested this on my blog. It's an *excellent* idea and needs a button added for it in the present Vector editor. Jen says: Wednesday 5th January, 2011 at 10:28 pm (Edit) Re John Broughton’s idea for “a **single click** way of generating the standard text/code for a footnote”… Would there be any way to make a nice little bookmarklet so people could drag a URL onto the button, and it would copy a wiki-citation to the clipboard? - d. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] References bookmarklet?
On 5 January 2011 22:36, David Gerard dger...@gmail.com wrote: http://davidgerard.co.uk/notes/2011/01/04/what-you-see-is-for-the-win/comment-page-1/#comment-13632 Someone suggested this on my blog. It's an *excellent* idea and needs a button added for it in the present Vector editor. Jen says: Wednesday 5th January, 2011 at 10:28 pm (Edit) Re John Broughton’s idea for “a **single click** way of generating the standard text/code for a footnote”… Would there be any way to make a nice little bookmarklet so people could drag a URL onto the button, and it would copy a wiki-citation to the clipboard? Basically no If you look at even [[Template:Cite web]] it requires stuff that you have to go hunting for (author). You could construct something for popular websites (BBC say) which have a standard format. -- geni ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] References bookmarklet?
Related project by Mozilla and some other developers, including from Creative Commons: https://wiki.mozilla.org/Drumbeat/Attribution_generator https://wiki.mozilla.org/Drumbeat/Attribution_generatorSteven On Wed, Jan 5, 2011 at 2:40 PM, geni geni...@gmail.com wrote: On 5 January 2011 22:36, David Gerard dger...@gmail.com wrote: http://davidgerard.co.uk/notes/2011/01/04/what-you-see-is-for-the-win/comment-page-1/#comment-13632 Someone suggested this on my blog. It's an *excellent* idea and needs a button added for it in the present Vector editor. Jen says: Wednesday 5th January, 2011 at 10:28 pm (Edit) Re John Broughton’s idea for “a **single click** way of generating the standard text/code for a footnote”… Would there be any way to make a nice little bookmarklet so people could drag a URL onto the button, and it would copy a wiki-citation to the clipboard? Basically no If you look at even [[Template:Cite web]] it requires stuff that you have to go hunting for (author). You could construct something for popular websites (BBC say) which have a standard format. -- geni ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] References bookmarklet?
On 5 January 2011 22:40, geni geni...@gmail.com wrote: Basically no If you look at even [[Template:Cite web]] it requires stuff that you have to go hunting for (author). You could construct something for popular websites (BBC say) which have a standard format. Sounds like something we could add really quite a lot of special cases to. I wonder how many we would need to have decent coverage in practice. Has anyone done a survey of what sources we actually use in references? The long tail will be *huge*, but does the en:wp community have any favourites? - d. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] References bookmarklet?
On Wed, Jan 5, 2011 at 3:44 PM, David Gerard dger...@gmail.com wrote: On 5 January 2011 22:40, geni geni...@gmail.com wrote: Basically no If you look at even [[Template:Cite web]] it requires stuff that you have to go hunting for (author). You could construct something for popular websites (BBC say) which have a standard format. Sounds like something we could add really quite a lot of special cases to. I wonder how many we would need to have decent coverage in practice. Has anyone done a survey of what sources we actually use in references? The long tail will be *huge*, but does the en:wp community have any favourites? - d. I have created a tool called WikiPapers that my lab has used for several years that does something similar to this. It is designed around scientific papers. It allows you to highlight the title of an article on any web page and then click it a bookmarklet and it will use various APIs on the web to get the associated metadata and add it to your wiki. It can optionally pass the URL to one of many URL scrapers such as Connotea and CiteULike. I am currently refactoring the code for use in a new project called WikiScholar. The old code supports PubMed, Google Scholar, Connotea and CiteULike, whereas the new code only supports PubMed right now. The new code, however, makes it much simpler to add new importers with its class-based infrastructure. If anyone is interested in this project and can code in Python or PHP please let me know. I am actively developing it now. I'm interested in folks who would like to dedicate some time to writing importers for specific APIs. Cheers, Brian ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] References bookmarklet?
On Wed, Jan 5, 2011 at 3:50 PM, Brian brian.min...@colorado.edu wrote: On Wed, Jan 5, 2011 at 3:44 PM, David Gerard dger...@gmail.com wrote: On 5 January 2011 22:40, geni geni...@gmail.com wrote: Basically no If you look at even [[Template:Cite web]] it requires stuff that you have to go hunting for (author). You could construct something for popular websites (BBC say) which have a standard format. Sounds like something we could add really quite a lot of special cases to. I wonder how many we would need to have decent coverage in practice. Has anyone done a survey of what sources we actually use in references? The long tail will be *huge*, but does the en:wp community have any favourites? - d. I have created a tool called WikiPapers that my lab has used for several years that does something similar to this. It is designed around scientific papers. It allows you to highlight the title of an article on any web page and then click it a bookmarklet and it will use various APIs on the web to get the associated metadata and add it to your wiki. It can optionally pass the URL to one of many URL scrapers such as Connotea and CiteULike. I am currently refactoring the code for use in a new project called WikiScholar. The old code supports PubMed, Google Scholar, Connotea and CiteULike, whereas the new code only supports PubMed right now. The new code, however, makes it much simpler to add new importers with its class-based infrastructure. If anyone is interested in this project and can code in Python or PHP please let me know. I am actively developing it now. I'm interested in folks who would like to dedicate some time to writing importers for specific APIs. Cheers, Brian PS: The Google Code url is: http://code.google.com/p/wikipapers/ ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] References bookmarklet?
On 5 January 2011 22:51, Brian J Mingus brian.min...@colorado.edu wrote: On Wed, Jan 5, 2011 at 3:50 PM, Brian brian.min...@colorado.edu wrote: I have created a tool called WikiPapers that my lab has used for several PS: The Google Code url is: http://code.google.com/p/wikipapers/ If you can get it in fit condition that you would let the wikitech-l conspiracy at it, I urge you to do so. - d. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l