Re: [Wikitech-l] A metadata API module for commons

Brian Wolff Mon, 09 Sep 2013 17:27:56 -0700

On 9/6/13, Daniel Kinzler <dan...@brightbyte.de> wrote:
> The only thing I'm slightly worried about is the data model and
> representation
> of the metadata. Swapping one backend for another will only work if they are
> conceptually compatible.


The data model I was using was simple key-value pairs. Specifically it
was using the various properties defined by Exif (and other metadata
things that MediaWiki extracts from files) as the key names. I imagine
wikidata would allow for much more complex types of metadata. I was
thinking this api module would serve to gather the "basic"
information, and wikidata would have its own querying endpoints for
the complex view of its metadata.

>
> Can you give a brief overview of how you imagine the output of the API would
> be
> structured, and what information it would contain?

As an example, for the url of the license:
<LicenseUrl source="commons-desc-page" translatedName="URL for
copyright license" hidden=""
xml:space="preserve">http://creativecommons.org/licenses/by-sa/3.0/at/deed.en</LicenseUrl>

Which contains the key name ("LicenseUrl"), the place where the data
was retrieved from ("commons-desc-page", as opposed to "file-metadata"
if it came from the CC:LicenseUrl property of XMP data embedded in the
file), the translated name of the key name ( "URL for copyright
license", coming from MediaWiki:Exif-licenseurl message), whether or
not this property is hidden when displayed on image description page
(true in the example), and the value of the property
(http://creativecommons.org/licenses/by-sa/3.0/at/deed.en)


>
> Also, your original proposal said something about outputting HTML. That
> confuses
> me - an API module would return structured data, why would you use HTML to
> represent the metadata? That makes it a lot harder to process...

It does. Part of the reason, is I wanted something that could
instantly be displayed to the user, hence more user friendly than
machine friendly (For example human readable timestamps instead of iso
timestamps. Human readable flash firing values, vs constant). The
second reason is the source of the data. If we look at the description
field on a commons image page, we have things like:

"Front and western side of the house located at 912 E. First Street in
{{w|Bloomington, Indiana|Bloomington}}, {{w|Indiana}}, {{w|United
States}}.  Built in 1925, it is part of the locally-designated Elm
Heights Historic District."

Which has links in it. There's a couple options for what we can do
with that. We can give it out as is, or we could expand templates and
return:

"Front and western side of the house located at 912 E. First Street in
[[:w:Bloomington, Indiana|Bloomington]], [[:w:Indiana|Indiana]],
[[:w:United States|United States]].  Built in 1925, it is part of the
locally-designated Elm Heights Historic District."

Or we could return html:
Front and western side of the house located at 912 E. First Street in
<a href="//en.wikipedia.org/wiki/Bloomington,_Indiana" class="extiw"
title="w:Bloomington, Indiana">Bloomington</a>, <a
href="//en.wikipedia.org/wiki/Indiana" class="extiw"
title="w:Indiana">Indiana</a>, <a
href="//en.wikipedia.org/wiki/United_States" class="extiw"
title="w:United States">United States</a>.  Built in 1925, it is part
of the locally-designated Elm Heights Historic District.

Or we could ditch the html entirely:

Front and western side of the house located at 912 E. First Street in
Bloomington, Indiana, United States. Built in 1925, it is part of the
locally-designated Elm Heights Historic District.

I think returning the html is the option that is most honest to the
original data, while still being easy to process. Sometimes the
formatting in the description field is more complex than just simple
links.

Given that the use case of showing data to user and having metadata
that is easy to process for computers are slightly different, perhaps
it makes sense to have two different modules, one that returns html
(and human formatted things for timestamps, etc), and the other that
returns more machine oriented data (including perhaps the version of
the description tag with all html stripped out).

--bawolff

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] A metadata API module for commons

Reply via email to