The debate about RDFa highlights a disconnect in the decision making related to HTML5.

The purpose behind RDFa is to provide a way to embed complex information into a web document, in such a way that a machine can extract this information and combine it with other data extracted from other web pages. It is not a way to document private data, or data that is meant to be used by some JavaScript-based application. The sole purpose of the data is for external extraction and combination.

An earlier email between Martin Atkins and Ian Hickson had the following:

"On Sun, 11 Jan 2009, Martin Atkins wrote:
>
> One problem this can solve is that an agent can, given a URL that
> represents a person, extract some basic profile information such as the
> person's name along with references to other people that person knows.
> This can further be applied to allow a user who provides his own URL
> (for example, by signing in via OpenID) to bootstrap his account from
> existing published data rather than having to re-enter it.
>
> So, to distill that into a list of requirements:
>
> - Allow software agents to extract profile information for a person as often > exposed on social networking sites from a page that "represents" that person.
>
> - Allow software agents to determine who a person lists as their friends
> given a page that "represents" that person.
>
> - Allow the above to be encoded without duplicating the data in both
> machine-readable and human-readable forms.
>
> Is this the sort of thing you're looking for, Ian?

Yes, the above is perfect. (I cut out the bits that weren't really "the
problem" from the quote above -- the above is what I'm looking for.)

The most critical part is "allow a user who provides his own URL to
bootstrap his account from existing published data rather than having to
re-enter it". The one thing I would add would be a scenario that one would
like to be able to play out, so that we can see if our solution would
enable that scenario.

For example:

  "I have an account on social networking site A. I go to a new social
  networking site B. I want to be able to automatically add all my
  friends from site A to site B."

There are presumably other requirements, e.g. "site B must not ask the
user for the user's credentials for site A" (since that would train people
to be susceptible to phishing attacks). Also, "site A must not publish the
data in a manner that allows unrelated users to obtain privacy-sensitive
data about the user", for example we don't want to let other users
determine relationships that the user has intentionally kept secret [1].

It's important that we have these scenarios so that we can check if the
solutions we consider are actually able to solve these problems, these
scenarios, within the constraints and requirements we have."


It would seem that Ian agrees with a need to both a) provide a way to document complex information in a consistent, machine readable form and that b) the purpose of this data is for external consumption, rather than internal use. Where the disconnect comes in is he believes that RDF, and the web page serialization technique, RDFa, are only one of a set of possible solutions.

Yet at the same time, he references how the MathML and SVG people provide sufficient use cases to justify the inclusion of both of these into HTML5. But what is MathML. What does it solve? A way to include mathematical formula into a document in a formatted manner. What is SVG? A way to embed vector graphics into a web page, in such a way that the individual elements described by the graphics can become part of the overall DOM.

So, why accept that we have to use MathML in order to solve the problems of formatting mathematical formula? Why not start from scratch, and devise a new approach?

So, why accept that we have to use SVG in order to solve the problems of vector graphics? Why not start from scratch, and devise a new approach?

Come to think of it, I think we should also question the use of the canvas element. After all, if the problem set is that we need the ability to animate graphics in a web page using a non-proprietary technology, then wouldn't something like SVG work for this purpose? Isn't the canvas element redundant? But then, perhaps we should start over from the beginning and just create a new graphics capability from scratch, and reject both canvas and SVG.

We don't reject MathML, though. Neither do we reject SVG or canvas. Or any other of a number of entities being included in HTML5, including SQL. Why? Because they have a history of use, extensive documentation as to purpose and behavior, and there are a considerable number of implementations that support the specifications. It doesn't make sense to start from scratch. It makes more sense to make use of what already works.

I have to ask, then: why do we isolate RDF, and RDFa for special handling? If we can accept that SQL is a natural database query mechanism, and SVG is a natural for vector graphics, and the canvas element is the proper choice for a script-enabled bitmaps, and MathML...well, you get the picture-if we can accept that these mature, well documented representatives of each of their genres as the de facto implementation, enough to incorporate each into HTML5, why then do we demand that RDF and its web page serialization technique, RDFa, must "prove" themselves, when we don't demand the same from other external objects and specifications?

To do so is not consistent. To continue to do so demonstrates that perhaps other issues are at play in regards to RDF/RDFa.

Martin provided a use case that Ian acknowledges is justified. Ipso facto, we do not need to continue providing use cases for this type of requirement. We have established that the requirement/need/desire to incorporate data into a web page that is consistently machine readable, which can be consistently extracted, and consistently combined with data from other documents using automated processes is a legitimate need. RDF was designed specifically for this purpose, is a mature specification, with extensive documentation, and one can find many different implementations of its use. The use of RDF for FOAF is just one of many uses, RSS 1.0 was another, and a version of RDF embedded within photos, CC licensing--these are all based on the same model.

In other words, if we accept that SVG is the de facto implementation of vector graphics (as compared to something such as, say, VML), and we accept the same for MathML, the canvas element, SQL, and so on, to not accept RDF as the de facto implementation for the purpose behind which it was designed, is to single out RDF/RDFa for "special handling" within the group. To demand more from it, then has been demanded from any other element included in HTML5.

In particular, as has been documented elsewhere, very little is needed to support RDFa within HTML5. The requirements are much less than those for the canvas element, SVG, MathML, and even SQL. So the task, itself, is not daunting. Not as daunting as, say, the alt attribute.

This then returns us to my earlier supposition: To not support RDF/RDFa as the de facto implementation of complex, structured data is not consistent. To continue to do so demonstrates that perhaps other issues are at play in regards to RDF/RDFa. Such inconsistencies are not in the best interest when developing a new specification meant for widespread use on the web. If, as I believe, the inconsistency reflects an underlying bias against the concept behind RDF, which is that true web semantics is based on structured data, not natural language processing, or not exclusively based on natural language processing, then I believe it's important to highlight such bias, and deal with it accordingly.

Shelley


Reply via email to