Trying my best to limit length of reply.

On Sat, Jan 16, 2010 at 23:16, Manu Sporny <mspo...@digitalbazaar.com> wrote:
> Philip Jägenstedt wrote:

>> [ed: Microdata] maps well to the
>> RDF model if you want it, but doesn't force authors to think in terms
>> of subject, predicate, object triples.
>
> Well, Microdata /almost/ maps to the RDF model. Microdata doesn't
> support RDF literal typing, which is basically a fancy way of saying
> that you can't verify that weights, volumes, speeds, the full range of
> dates in different calendars, encodings such as chemical compositions,
> and varying other typed information is expressed cleanly by the
> Wikipedia contributors.
>
> So, if you wanted to say something like this:
>
> The speed of light is 299792458 m/s.
>
> You would do this in RDFa:
>
> <div about="#light">
> The speed of light is <span property="measure:speed"
> datatype="measure:meters-per-second">299792458</span> m/s.
> </div>
>
> which would generate the following triple:
>
> <#light>
>   measure:speed
>      "299792458"^^measure:meters-per-second .
>
> AFAIK, there is no way to do the equivalent in Microdata, is there Philip?

The datatype is a part of the vocabulary, if you want to validate your
data you validate it against the vocabulary, not what the author
claims. For examples, you'll see that the vCard vocabulary defines its
own datatypes: 
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#vcard

Allowing mixing different types (like m/s and km/h) seems risky, but
is one of the things that exist in the RDF model that can't be
expressed directly using microdata, that is correct.

> The above is how you would do it in RDFa. Philip, I haven't seen any
> work related to this in Microdata - have there been any recent
> developments with regard to data validation in Microdata?

There is nothing like automatic validation, your software has
understand a certain vocabulary to be able to say if the data conforms
to the constraints of that particular vocabulary. (I don't know if
this is any different from the RDF model or if RDF software is able to
"automatically" learn how to validate measure:meters-per-second from
just seeing the string "measure:meters-per-second".)

>>> So, we get more-or-less the same number of data items out, but there is
>>> a problem. What does "title" mean in the semantic sense? Does it mean
>>> "job title" or does it mean "work title"? The term "title" in this case
>>> is ambiguous.
>>
>> No, as long as an item type is used (http://n.whatwg.org/work) there
>> is no ambiguity. This particular item type is defined at
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#licensing-works
>>
>> Title here "Gives the name of the work." without ambiguity.
>
> This is new! I'm glad this issue was addressed in Microdata as it was
> one of my criticisms of it when I last read the Microdata spec about six
> months ago. Looks like that section of the spec was last changed on
> October 23rd 2009? Do you know when this was put in there, Philip?

Originally microdata used item="http://n.whatwg.org/work";, but even
then there was no ambiguity about what a particular property meant.

> What happens when an author forgets to include itemtype? So, if somebody
> does this:
>
> <div itemscope>
> <span itemprop="title">Emery Molyneux Terrestrial Globe</span>
> </div>
>
> There's nothing to ground the "title" property. The way I'm reading the
> spec, it becomes ambiguous at that point, right?

Like Aryeh said it's not ambiguous, it's meaningless. Microdata allows
typeless items for site-private use (much like data-*), but such data
*should not* be used by external parties and is in fact ignored by the
RDF extraction algorithm.

> ... and with the added danger of expressing ambiguous data. This is not
> the real danger, though. While data ambiguity is really bad when it
> comes to data stores, centralized vocabulary management is even worse.

Anyone can make up a vocabulary, just point to it in itemtype. The
WHATWG maintains a few core vocabularies, but I expect that new
vocabularies will be developed independently by communities like
microformats.

> Philip, could you give us an update on what the WHATWG sees as the
> publishing process for Microdata vocabularies? For example, if Wikipedia
> wanted to start expressing royal bloodlines using a vocabulary specific
> to Wikipedia, how would they go about getting that vocabulary into the
> HTML5 Microdata specification?

No process, just do it :)

>> Finally I will note that it is very likely that the microdata DOM APIs
>> will get implemented in browsers, making the semantic data available
>> to both scrapers, to native browser interfaces and to browser
>> extensions such as user JavaScript. As an example, you might see an
>> icon in the address bar for saving events to a calendar, or the
>> license information of an image displayed in the native properties
>> dialog. I stress again that I don't make any promises on behalf of
>> Opera or any other browser vendor, these are just my predictions.
>
> Again, this is exciting news and while I don't think Microdata is the
> proper solution for the Web, for the same reasons that are outlined
> above and many more, I'm delighted to hear that Opera is taking
> in-browser semantic data expression very seriously. How far we have come
> in just 18 months! :)

I will stress again that I don't speak for Opera in these matters, but
I do think that microdata in many ways bridges the gap between the
"browsable web" and the "semantic web" (actually, there is only one
web). Browsers already do add some UI features based on the data in
documents (apart from rendering), e.g. exposing RSS feeds in the
address bar or navigating to the next page based on rel="next".
Microdata isn't really new in that regard, it just adds some new data
for browsers to expose.

-- 
Philip Jägenstedt

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to