The link Derk-Jan mentioned is a great resource indeed. However consumption of that data (mostly HTML structures) isn't very straight forward in most programming languages.
A tool has been written to, until we have a better way in the MediaWIki software, expose this information: https://commons.wikimedia.org/wiki/Commons:Commons_API https://tools.wmflabs.org/magnus-toolserver/commonsapi.php https://tools.wmflabs.org/magnus-toolserver/commonsapi.php?image=Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&meta — Krinkle On 3 Jun 2014, at 14:11, Derk-Jan Hartman <d.j.hartman+wmf...@gmail.com> wrote: > Actually, what you really seem to want is to make use of > iiprop=extmetadata, which is an API that makes use of > https://commons.wikimedia.org/wiki/Commons:Machine-readable_data > included in the various templates. The MultimediaViewer project also > uses this API. > > https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=extmetadata|timestamp|user|comment|url|size|mime&prop=imageinfo|revisions&rvgeneratexml=&rvprop=ids|timestamp|user|comment|content > > Where this is not accurate, you might have to fix up some templates to > make them better machine readable. It's all pretty new, and it's > basically a managed web scraper in itself, but it's probably better to > have one web scraper, than multiple. > > DJ > > On Tue, Jun 3, 2014 at 10:18 AM, james harvey <jamespharve...@gmail.com> > wrote: >> Sorry for the email spam. Worked through it, I think. Not too familiar >> with wiki internals. :-) >> >> This particular page doesn't have the content I'm looking for in it. It >> references a template which is used by a few other versions of the same >> image, presumably so the data can be stored once and be given consistently. >> Not being familiar with wiki internals, that was looking to me like it >> wasn't returning the entire page content... But it is, so I'll have to >> recognize this situation and pull referenced templates when the information >> I need isn't already there. >> >> >> On Tue, Jun 3, 2014 at 2:45 AM, james harvey <jamespharve...@gmail.com> >> wrote: >> >>> I may have stumbled upon it. If I change the API call from >>> "titles=File:XYZ.jpg" to "titles=Template:XYZ" (note: dropped the .jpg) >>> then it *appears* to get me what I need. >>> >>> Is this correct, or did I run across a case where it appears to work but >>> isn't going to be the right way to go? (Like, I'm not sure if >>> "Template:XYZ" directly relates to the Summary information on the >>> "File:XYZ.jpg" page, or if it's duplicated data that in this case matches. >>> And, I'm confused why the .jpg gets dropped switching "File:" to >>> "Template:") >>> >>> And, will this always get me the full template information, or if someone >>> just updates the "Year" portion, would it only return back that part -- >>> since the revisions seem to be returning data as much as they can based on >>> changes from the previous revision, rather than the answer ignoring any >>> other revisions. >>> >>> On Tue, Jun 3, 2014 at 1:59 AM, james harvey <jamespharve...@gmail.com> >>> wrote: >>> >>>> Given a Wikimedia Commons description page URL - such as: >>>> https://commons.wikimedia.org/wiki/File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg >>>> >>>> I would like to be able to programmatically retrieve the information in >>>> the "Summary" header. (Values for "Artist", "Title", "Date", "Medium", >>>> "Dimensions", "Current location", etc.) >>>> >>>> I believe all this information is in "Template:Artwork". I can't figure >>>> out how to get the wikitext/json-looking template data. >>>> >>>> If I use the API and call: >>>> https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=timestamp|user|comment|url|size|mime&prop=imageinfo|revisions&rvgeneratexml=&rvprop=ids|timestamp|user|comment|content >>>> <https://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg&iilimit=max&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Cmime&prop=imageinfo%7Crevisions&rvgeneratexml=&rvprop=ids%7Ctimestamp%7Cuser%7Ccomment%7Ccontent> >>>> >>>> Then I don't get the information I'm looking for. This shows the most >>>> recent revision, and its changes. Unless the most recent revision changed >>>> this data, it doesn't show up. >>>> >>>> To see all the information I'm looking for, it seems I'd have to specify >>>> rvlimit=max and go through all the past revisions to figure out which is >>>> most current. For example, if I do so and I look at revid 79665032, that >>>> includes: "{{Artwork | Artist = {{Creator:Vincent van Gogh}} | . . . | Year >>>> = 1889 | Technique = {{Oil on canvas}} | . . ." >>>> >>>> Isn't there a way to get the current version in whatever format you'd >>>> call that - the wikitext/json looking format? >>>> >>>> In my API call, I can specify rvexpandtemplates which even with only the >>>> most recent revision gives me the information I need, but it's largely in >>>> HTML tables/divs/etc format rather than wikitext/json/xml/etc. >>>> >>> >>> >> _______________________________________________ >> Wikitech-l mailing list >> Wikitech-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l