[WikiEN-l] exporting sets of pages

2011-01-05 Thread Rajarshi Guha
Hi, I wasn't sure whether this was the appropriate mailing list for
this question - if not, pointers to the correct one would be
appreciated.

I would like to retrieve pages that contain, say, a DrugBox. The
following URL lists all pages that contain this info box

http://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Drugboxnamespace=0limit=5000hidetrans=

What I'd like to do is then do a bulk export of these pages. As far as
I can tell, the Export options require that one provide article
titles. Furthermore, for some other infoboxes I have to page through
the results. Instead I'd like to do this programmatically.

The obvious solution would be to load Wikipedia into a local MySQL DB
and then perform the queries directly. But I'm interested in a rather
small subset of Wikipedia and loading the whole thing locally seems
overkill.

Is there a way I could export the articles containing Drugboxes or do
I need to install Wikipedia locally?

Thanks,

-- 
Rajarshi Guha
NIH Chemical Genomics Center

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] exporting sets of pages

2011-01-05 Thread Tim Starling
On 05/01/11 10:03, Rajarshi Guha wrote:
 Is there a way I could export the articles containing Drugboxes or do
 I need to install Wikipedia locally?

The best way to do it would be to get the list of articles using the API:

http://www.mediawiki.org/wiki/API

If that's too hard, you could could download templatelinks.sql.gz from

http://download.wikimedia.org/enwiki/latest/

and load them into a MySQL database, and then use that to get the list
of articles. But it's a big file and it's out of date. Either way, you
should get a list of articles and then download them in small batches
(say 10 articles at a time) using Special:Export. This may require a
small amount of scripting.

-- Tim Starling


___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] References bookmarklet?

2011-01-05 Thread David Gerard
http://davidgerard.co.uk/notes/2011/01/04/what-you-see-is-for-the-win/comment-page-1/#comment-13632

Someone suggested this on my blog. It's an *excellent* idea and needs
a button added for it in the present Vector editor.



Jen says:
Wednesday 5th January, 2011 at 10:28 pm  (Edit)

Re John Broughton’s idea for “a **single click** way of generating the
standard text/code for a footnote”…

Would there be any way to make a nice little bookmarklet so people
could drag a URL onto the button, and it would copy a wiki-citation to
the clipboard?



- d.

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] References bookmarklet?

2011-01-05 Thread geni
On 5 January 2011 22:36, David Gerard dger...@gmail.com wrote:
 http://davidgerard.co.uk/notes/2011/01/04/what-you-see-is-for-the-win/comment-page-1/#comment-13632

 Someone suggested this on my blog. It's an *excellent* idea and needs
 a button added for it in the present Vector editor.



 Jen says:
 Wednesday 5th January, 2011 at 10:28 pm  (Edit)

 Re John Broughton’s idea for “a **single click** way of generating the
 standard text/code for a footnote”…

 Would there be any way to make a nice little bookmarklet so people
 could drag a URL onto the button, and it would copy a wiki-citation to
 the clipboard?


Basically no

If you look at even [[Template:Cite web]] it requires stuff that you
have to go hunting for (author).

You could construct something for popular websites (BBC say) which
have a standard format.

-- 
geni

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] References bookmarklet?

2011-01-05 Thread Steven Walling
Related project by Mozilla and some other developers, including from
Creative Commons:

https://wiki.mozilla.org/Drumbeat/Attribution_generator

https://wiki.mozilla.org/Drumbeat/Attribution_generatorSteven

On Wed, Jan 5, 2011 at 2:40 PM, geni geni...@gmail.com wrote:

 On 5 January 2011 22:36, David Gerard dger...@gmail.com wrote:
 
 http://davidgerard.co.uk/notes/2011/01/04/what-you-see-is-for-the-win/comment-page-1/#comment-13632
 
  Someone suggested this on my blog. It's an *excellent* idea and needs
  a button added for it in the present Vector editor.
 
 
 
  Jen says:
  Wednesday 5th January, 2011 at 10:28 pm  (Edit)
 
  Re John Broughton’s idea for “a **single click** way of generating the
  standard text/code for a footnote”…
 
  Would there be any way to make a nice little bookmarklet so people
  could drag a URL onto the button, and it would copy a wiki-citation to
  the clipboard?


 Basically no

 If you look at even [[Template:Cite web]] it requires stuff that you
 have to go hunting for (author).

 You could construct something for popular websites (BBC say) which
 have a standard format.

 --
 geni

 ___
 WikiEN-l mailing list
 WikiEN-l@lists.wikimedia.org
 To unsubscribe from this mailing list, visit:
 https://lists.wikimedia.org/mailman/listinfo/wikien-l

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] References bookmarklet?

2011-01-05 Thread David Gerard
On 5 January 2011 22:40, geni geni...@gmail.com wrote:

 Basically no
 If you look at even [[Template:Cite web]] it requires stuff that you
 have to go hunting for (author).
 You could construct something for popular websites (BBC say) which
 have a standard format.


Sounds like something we could add really quite a lot of special cases
to. I wonder how many we would need to have decent coverage in
practice. Has anyone done a survey of what sources we actually use in
references? The long tail will be *huge*, but does the en:wp community
have any favourites?


- d.

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] References bookmarklet?

2011-01-05 Thread Brian J Mingus
On Wed, Jan 5, 2011 at 3:44 PM, David Gerard dger...@gmail.com wrote:

 On 5 January 2011 22:40, geni geni...@gmail.com wrote:

  Basically no
  If you look at even [[Template:Cite web]] it requires stuff that you
  have to go hunting for (author).
  You could construct something for popular websites (BBC say) which
  have a standard format.


 Sounds like something we could add really quite a lot of special cases
 to. I wonder how many we would need to have decent coverage in
 practice. Has anyone done a survey of what sources we actually use in
 references? The long tail will be *huge*, but does the en:wp community
 have any favourites?


 - d.


I have created a tool called WikiPapers that my lab has used for several
years that does something similar to this. It is designed around scientific
papers. It allows you to highlight the title of an article on any web page
and then click it a bookmarklet and it will use various APIs on the web to
get the associated metadata and add it to your wiki. It can optionally pass
the URL to one of many URL scrapers such as Connotea and CiteULike. I am
currently refactoring the code for use in a new project called WikiScholar.
The old code supports PubMed, Google Scholar, Connotea and CiteULike,
whereas the new code only supports PubMed right now. The new code, however,
makes it much simpler to add new importers with its class-based
infrastructure.

If anyone is interested in this project and can code in Python or PHP please
let me know. I am actively developing it now. I'm interested in folks who
would like to dedicate some time to writing importers for specific APIs.

Cheers,
Brian
___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] References bookmarklet?

2011-01-05 Thread Brian J Mingus
On Wed, Jan 5, 2011 at 3:50 PM, Brian brian.min...@colorado.edu wrote:



 On Wed, Jan 5, 2011 at 3:44 PM, David Gerard dger...@gmail.com wrote:

 On 5 January 2011 22:40, geni geni...@gmail.com wrote:

  Basically no
  If you look at even [[Template:Cite web]] it requires stuff that you
  have to go hunting for (author).
  You could construct something for popular websites (BBC say) which
  have a standard format.


 Sounds like something we could add really quite a lot of special cases
 to. I wonder how many we would need to have decent coverage in
 practice. Has anyone done a survey of what sources we actually use in
 references? The long tail will be *huge*, but does the en:wp community
 have any favourites?


 - d.


 I have created a tool called WikiPapers that my lab has used for several
 years that does something similar to this. It is designed around scientific
 papers. It allows you to highlight the title of an article on any web page
 and then click it a bookmarklet and it will use various APIs on the web to
 get the associated metadata and add it to your wiki. It can optionally pass
 the URL to one of many URL scrapers such as Connotea and CiteULike. I am
 currently refactoring the code for use in a new project called WikiScholar.
 The old code supports PubMed, Google Scholar, Connotea and CiteULike,
 whereas the new code only supports PubMed right now. The new code, however,
 makes it much simpler to add new importers with its class-based
 infrastructure.

 If anyone is interested in this project and can code in Python or PHP
 please let me know. I am actively developing it now. I'm interested in folks
 who would like to dedicate some time to writing importers for specific APIs.

 Cheers,
 Brian


PS: The Google Code url is: http://code.google.com/p/wikipapers/
___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] References bookmarklet?

2011-01-05 Thread David Gerard
On 5 January 2011 22:51, Brian J Mingus brian.min...@colorado.edu wrote:
 On Wed, Jan 5, 2011 at 3:50 PM, Brian brian.min...@colorado.edu wrote:

 I have created a tool called WikiPapers that my lab has used for several

 PS: The Google Code url is: http://code.google.com/p/wikipapers/


If you can get it in fit condition that you would let the wikitech-l
conspiracy at it, I urge you to do so.


- d.

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l