The link Derk-Jan mentioned is a great resource indeed.

However consumption of that data (mostly HTML structures) isn't very straight 
forward in most programming languages.

A tool has been written to, until we have a better way in the MediaWIki 
software, expose this information:

https://commons.wikimedia.org/wiki/Commons:Commons_API

https://tools.wmflabs.org/magnus-toolserver/commonsapi.php

https://tools.wmflabs.org/magnus-toolserver/commonsapi.php?image=Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&meta

— Krinkle

On 3 Jun 2014, at 14:11, Derk-Jan Hartman <d.j.hartman+wmf...@gmail.com> wrote:

> Actually, what you really seem to want is to make use of
> iiprop=extmetadata, which is an API that makes use of
> https://commons.wikimedia.org/wiki/Commons:Machine-readable_data
> included in the various templates. The MultimediaViewer project also
> uses this API.
> 
> https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=extmetadata|timestamp|user|comment|url|size|mime&prop=imageinfo|revisions&rvgeneratexml=&rvprop=ids|timestamp|user|comment|content
> 
> Where this is not accurate, you might have to fix up some templates to
> make them better machine readable. It's all pretty new, and it's
> basically a managed web scraper in itself, but it's probably better to
> have one web scraper, than multiple.
> 
> DJ
> 
> On Tue, Jun 3, 2014 at 10:18 AM, james harvey <jamespharve...@gmail.com> 
> wrote:
>> Sorry for the email spam.  Worked through it, I think.  Not too familiar
>> with wiki internals.  :-)
>> 
>> This particular page doesn't have the content I'm looking for in it.  It
>> references a template which is used by a few other versions of the same
>> image, presumably so the data can be stored once and be given consistently.
>> Not being familiar with wiki internals, that was looking to me like it
>> wasn't returning the entire page content... But it is, so I'll have to
>> recognize this situation and pull referenced templates when the information
>> I need isn't already there.
>> 
>> 
>> On Tue, Jun 3, 2014 at 2:45 AM, james harvey <jamespharve...@gmail.com>
>> wrote:
>> 
>>> I may have stumbled upon it.  If I change the API call from
>>> "titles=File:XYZ.jpg" to "titles=Template:XYZ" (note: dropped the .jpg)
>>> then it *appears* to get me what I need.
>>> 
>>> Is this correct, or did I run across a case where it appears to work but
>>> isn't going to be the right way to go?  (Like, I'm not sure if
>>> "Template:XYZ" directly relates to the Summary information on the
>>> "File:XYZ.jpg" page, or if it's duplicated data that in this case matches.
>>> And, I'm confused why the .jpg gets dropped switching "File:" to
>>> "Template:")
>>> 
>>> And, will this always get me the full template information, or if someone
>>> just updates the "Year" portion, would it only return back that part --
>>> since the revisions seem to be returning data as much as they can based on
>>> changes from the previous revision, rather than the answer ignoring any
>>> other revisions.
>>> 
>>> On Tue, Jun 3, 2014 at 1:59 AM, james harvey <jamespharve...@gmail.com>
>>> wrote:
>>> 
>>>> Given a Wikimedia Commons description page URL - such as:
>>>> https://commons.wikimedia.org/wiki/File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg
>>>> 
>>>> I would like to be able to programmatically retrieve the information in
>>>> the "Summary" header.  (Values for "Artist", "Title", "Date", "Medium",
>>>> "Dimensions", "Current location", etc.)
>>>> 
>>>> I believe all this information is in "Template:Artwork".  I can't figure
>>>> out how to get the wikitext/json-looking template data.
>>>> 
>>>> If I use the API and call:
>>>> https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=timestamp|user|comment|url|size|mime&prop=imageinfo|revisions&rvgeneratexml=&rvprop=ids|timestamp|user|comment|content
>>>> <https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Cmime&prop=imageinfo%7Crevisions&rvgeneratexml=&rvprop=ids%7Ctimestamp%7Cuser%7Ccomment%7Ccontent>
>>>> 
>>>> Then I don't get the information I'm looking for.  This shows the most
>>>> recent revision, and its changes.  Unless the most recent revision changed
>>>> this data, it doesn't show up.
>>>> 
>>>> To see all the information I'm looking for, it seems I'd have to specify
>>>> rvlimit=max and go through all the past revisions to figure out which is
>>>> most current.  For example, if I do so and I look at revid 79665032, that
>>>> includes: "{{Artwork | Artist = {{Creator:Vincent van Gogh}} | . . . | Year
>>>> = 1889 | Technique = {{Oil on canvas}} | . . ."
>>>> 
>>>> Isn't there a way to get the current version in whatever format you'd
>>>> call that - the wikitext/json looking format?
>>>> 
>>>> In my API call, I can specify rvexpandtemplates which even with only the
>>>> most recent revision gives me the information I need, but it's largely in
>>>> HTML tables/divs/etc format rather than wikitext/json/xml/etc.
>>>> 
>>> 
>>> 
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> 
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to