[whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector

Shelley Powers Sat, 17 Jan 2009 08:55:17 -0800

The debate about RDFa highlights a disconnect in the decision makingrelated to HTML5.

The purpose behind RDFa is to provide a way to embed complex informationinto a web document, in such a way that a machine can extract thisinformation and combine it with other data extracted from other webpages. It is not a way to document private data, or data that is meantto be used by some JavaScript-based application. The sole purpose of thedata is for external extraction and combination.


An earlier email between Martin Atkins and Ian Hickson had the following:

"On Sun, 11 Jan 2009, Martin Atkins wrote:
>
> One problem this can solve is that an agent can, given a URL that
> represents a person, extract some basic profile information such as the
> person's name along with references to other people that person knows.
> This can further be applied to allow a user who provides his own URL
> (for example, by signing in via OpenID) to bootstrap his account from
> existing published data rather than having to re-enter it.
>
> So, to distill that into a list of requirements:
>

> - Allow software agents to extract profile information for a personas often> exposed on social networking sites from a page that "represents" thatperson.

>
> - Allow software agents to determine who a person lists as their friends
> given a page that "represents" that person.
>
> - Allow the above to be encoded without duplicating the data in both
> machine-readable and human-readable forms.
>
> Is this the sort of thing you're looking for, Ian?

Yes, the above is perfect. (I cut out the bits that weren't really "the
problem" from the quote above -- the above is what I'm looking for.)

The most critical part is "allow a user who provides his own URL to
bootstrap his account from existing published data rather than having to
re-enter it". The one thing I would add would be a scenario that one would
like to be able to play out, so that we can see if our solution would
enable that scenario.

For example:

  "I have an account on social networking site A. I go to a new social
  networking site B. I want to be able to automatically add all my
  friends from site A to site B."

There are presumably other requirements, e.g. "site B must not ask the
user for the user's credentials for site A" (since that would train people
to be susceptible to phishing attacks). Also, "site A must not publish the
data in a manner that allows unrelated users to obtain privacy-sensitive
data about the user", for example we don't want to let other users
determine relationships that the user has intentionally kept secret [1].

It's important that we have these scenarios so that we can check if the
solutions we consider are actually able to solve these problems, these
scenarios, within the constraints and requirements we have."

It would seem that Ian agrees with a need to both a) provide a way todocument complex information in a consistent, machine readable form andthat b) the purpose of this data is for external consumption, ratherthan internal use. Where the disconnect comes in is he believes thatRDF, and the web page serialization technique, RDFa, are only one of aset of possible solutions.

Yet at the same time, he references how the MathML and SVG peopleprovide sufficient use cases to justify the inclusion of both of theseinto HTML5. But what is MathML. What does it solve? A way to includemathematical formula into a document in a formatted manner. What is SVG?A way to embed vector graphics into a web page, in such a way that theindividual elements described by the graphics can become part of theoverall DOM.

So, why accept that we have to use MathML in order to solve the problemsof formatting mathematical formula? Why not start from scratch, anddevise a new approach?

So, why accept that we have to use SVG in order to solve the problems ofvector graphics? Why not start from scratch, and devise a new approach?

Come to think of it, I think we should also question the use of thecanvas element. After all, if the problem set is that we need theability to animate graphics in a web page using a non-proprietarytechnology, then wouldn't something like SVG work for this purpose?Isn't the canvas element redundant? But then, perhaps we should startover from the beginning and just create a new graphics capability fromscratch, and reject both canvas and SVG.

We don't reject MathML, though. Neither do we reject SVG or canvas. Orany other of a number of entities being included in HTML5, includingSQL. Why? Because they have a history of use, extensive documentation asto purpose and behavior, and there are a considerable number ofimplementations that support the specifications. It doesn't make senseto start from scratch. It makes more sense to make use of what alreadyworks.

I have to ask, then: why do we isolate RDF, and RDFa for specialhandling? If we can accept that SQL is a natural database querymechanism, and SVG is a natural for vector graphics, and the canvaselement is the proper choice for a script-enabled bitmaps, andMathML...well, you get the picture-if we can accept that these mature,well documented representatives of each of their genres as the de factoimplementation, enough to incorporate each into HTML5, why then do wedemand that RDF and its web page serialization technique, RDFa, must"prove" themselves, when we don't demand the same from other externalobjects and specifications?

To do so is not consistent. To continue to do so demonstrates thatperhaps other issues are at play in regards to RDF/RDFa.

Martin provided a use case that Ian acknowledges is justified. Ipsofacto, we do not need to continue providing use cases for this type ofrequirement. We have established that the requirement/need/desire toincorporate data into a web page that is consistently machine readable,which can be consistently extracted, and consistently combined with datafrom other documents using automated processes is a legitimate need. RDFwas designed specifically for this purpose, is a mature specification,with extensive documentation, and one can find many differentimplementations of its use. The use of RDF for FOAF is just one of manyuses, RSS 1.0 was another, and a version of RDF embedded within photos,CC licensing--these are all based on the same model.

In other words, if we accept that SVG is the de facto implementation ofvector graphics (as compared to something such as, say, VML), and weaccept the same for MathML, the canvas element, SQL, and so on, to notaccept RDF as the de facto implementation for the purpose behind whichit was designed, is to single out RDF/RDFa for "special handling" withinthe group. To demand more from it, then has been demanded from any otherelement included in HTML5.

In particular, as has been documented elsewhere, very little is neededto support RDFa within HTML5. The requirements are much less than thosefor the canvas element, SVG, MathML, and even SQL. So the task, itself,is not daunting. Not as daunting as, say, the alt attribute.

This then returns us to my earlier supposition: To not support RDF/RDFaas the de facto implementation of complex, structured data is notconsistent. To continue to do so demonstrates that perhaps other issuesare at play in regards to RDF/RDFa. Such inconsistencies are not in thebest interest when developing a new specification meant for widespreaduse on the web. If, as I believe, the inconsistency reflects anunderlying bias against the concept behind RDF, which is that true websemantics is based on structured data, not natural language processing,or not exclusively based on natural language processing, then I believeit's important to highlight such bias, and deal with it accordingly.


Shelley

[whatwg] RDFa is to structured data, like canvas is to bitmap and SVG is to vector

Reply via email to