Actually, what you really seem to want is to make use of
iiprop=extmetadata, which is an API that makes use of
https://commons.wikimedia.org/wiki/Commons:Machine-readable_data
included in the various templates. The MultimediaViewer project also
uses this API.

https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=extmetadata|timestamp|user|comment|url|size|mime&prop=imageinfo|revisions&rvgeneratexml=&rvprop=ids|timestamp|user|comment|content

Where this is not accurate, you might have to fix up some templates to
make them better machine readable. It's all pretty new, and it's
basically a managed web scraper in itself, but it's probably better to
have one web scraper, than multiple.

DJ

On Tue, Jun 3, 2014 at 10:18 AM, james harvey <jamespharve...@gmail.com> wrote:
> Sorry for the email spam.  Worked through it, I think.  Not too familiar
> with wiki internals.  :-)
>
> This particular page doesn't have the content I'm looking for in it.  It
> references a template which is used by a few other versions of the same
> image, presumably so the data can be stored once and be given consistently.
>  Not being familiar with wiki internals, that was looking to me like it
> wasn't returning the entire page content... But it is, so I'll have to
> recognize this situation and pull referenced templates when the information
> I need isn't already there.
>
>
> On Tue, Jun 3, 2014 at 2:45 AM, james harvey <jamespharve...@gmail.com>
> wrote:
>
>> I may have stumbled upon it.  If I change the API call from
>> "titles=File:XYZ.jpg" to "titles=Template:XYZ" (note: dropped the .jpg)
>> then it *appears* to get me what I need.
>>
>> Is this correct, or did I run across a case where it appears to work but
>> isn't going to be the right way to go?  (Like, I'm not sure if
>> "Template:XYZ" directly relates to the Summary information on the
>> "File:XYZ.jpg" page, or if it's duplicated data that in this case matches.
>>  And, I'm confused why the .jpg gets dropped switching "File:" to
>> "Template:")
>>
>> And, will this always get me the full template information, or if someone
>> just updates the "Year" portion, would it only return back that part --
>> since the revisions seem to be returning data as much as they can based on
>> changes from the previous revision, rather than the answer ignoring any
>> other revisions.
>>
>> On Tue, Jun 3, 2014 at 1:59 AM, james harvey <jamespharve...@gmail.com>
>> wrote:
>>
>>> Given a Wikimedia Commons description page URL - such as:
>>> https://commons.wikimedia.org/wiki/File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg
>>>
>>> I would like to be able to programmatically retrieve the information in
>>> the "Summary" header.  (Values for "Artist", "Title", "Date", "Medium",
>>> "Dimensions", "Current location", etc.)
>>>
>>> I believe all this information is in "Template:Artwork".  I can't figure
>>> out how to get the wikitext/json-looking template data.
>>>
>>> If I use the API and call:
>>> https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=timestamp|user|comment|url|size|mime&prop=imageinfo|revisions&rvgeneratexml=&rvprop=ids|timestamp|user|comment|content
>>> <https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Cmime&prop=imageinfo%7Crevisions&rvgeneratexml=&rvprop=ids%7Ctimestamp%7Cuser%7Ccomment%7Ccontent>
>>>
>>> Then I don't get the information I'm looking for.  This shows the most
>>> recent revision, and its changes.  Unless the most recent revision changed
>>> this data, it doesn't show up.
>>>
>>> To see all the information I'm looking for, it seems I'd have to specify
>>> rvlimit=max and go through all the past revisions to figure out which is
>>> most current.  For example, if I do so and I look at revid 79665032, that
>>> includes: "{{Artwork | Artist = {{Creator:Vincent van Gogh}} | . . . | Year
>>> = 1889 | Technique = {{Oil on canvas}} | . . ."
>>>
>>> Isn't there a way to get the current version in whatever format you'd
>>> call that - the wikitext/json looking format?
>>>
>>> In my API call, I can specify rvexpandtemplates which even with only the
>>> most recent revision gives me the information I need, but it's largely in
>>> HTML tables/divs/etc format rather than wikitext/json/xml/etc.
>>>
>>
>>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to