Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-22 Thread Toby Inkster
On Fri, 2009-05-22 at 12:26 +0200, Eduard Pascual wrote:
> Are you calling the DOM Consistency Principle a "theoretical" or
> "aesthetic" argument?

Certainly not -- DOM consistency is a great idea. But given that the
HTML5 spec defines how the DOM is built, there's a very simple solution
to that -- HTML5 could simply mandate that:

http://foo.example.com/";>

generates an identical DOM representation in both XHTML5 and HTML5.
What's the problem with that? 

In existing implementations, there are differences, sure. But for the
most part, those differences are pretty small and obscure, and don't
actually effect real world code very much. e.g. the following code seems
to work fine in Opera, Firefox and Midori (a Webkit browser):

http://buzzword.org.uk/2009/dom.html
http://buzzword.org.uk/2009/dom.xhtml

The files are byte-for-byte identical (indeed, on disk, one is just a
symlink to the other).

-- 
Toby Inkster 


Re: [whatwg] Removing the need for separate feeds

2009-05-22 Thread Toby Inkster
Eduard Pascual wrote:

> For manually authored pages and feeds things would be different; but
> are there really a significant ammount of such cases out there? I
> can't say I have seen the entire web (who can?), but among what I have
> seen, I have never encountered any hand authored feed, except for code
> examples and similar "experimental" stuff.

Surely this proves the need for a way of extracting feeds from HTML?

You never see manually written feeds because people can't be bothered to
manually write feeds. So the people who manually author HTML simply
don't bother providing feeds at all.

If an HTML page can *be* a feed, this allows manually authored HTML
pages to be subscribed to in feed readers.

-- 
Toby Inkster 


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-21 Thread Toby Inkster
ignored.

> Actually,
> there have been some complains [1] about why should HTML5 restraint
> itself from using quite useful attribute names such as "content" or
> "resource", just because RDFa decided to use them, without giving
> non-X HTML a thought.

Attribute names are not a scarce commodity. Just using the 26 letters of
the English alphabet (I avoid calling it the "Latin alphabet" given that
three of the letters are post-Roman inventions) you can create about 10
million different 5-letter attribute names. Certainly most of them are
nonsensical, but there are an awful lot of attribute names to choose
from, so it doesn't make sense to introduce potentially harmful clashes
where they could be avoided.

You beg the question of whether the RDFa task force invented attributes
without giving HTML a thought. Certainly RDFa's XHTML 2.0 heritage is
clear, but the language employed by the RDFa syntax document appears
very carefully chosen to accommodate HTML.

The processing sequence is defined in very DOM-like terms, making it
easy to carry out on any DOM tree without having to worry about the
serialisation that the DOM tree was built from.

As another example of its neutral stance, it says that language
information "can" be provided using xml:lang, but doesn't appear to rule
out other mechanisms for declaring language.

The only aspect of RDFa which doesn't sit especially well in HTML is
CURIE prefix mappings, which use xmlns:* attributes. In practice, it
doesn't seem to have proved a difficulty to those of us who have
implemented support for RDFa in HTML, but there are theoretical and
aesthetic arguments against it. But this is a small issue which is not
especially difficult to fix, and there's no reason to throw the baby out
with the bathwater. Various solutions to it are being discussed both
here and on the public-rdf-in-xhtml...@w3.org list.

> In other words: currently, RDFa parsers should have enough to ignore
> non-X HTML content (or, more specifically, documents with no default
> xmlns in , so they can also cope with the XHTML1.1+RDFa served
> as text/html aberration, which is wrong no matter how you look at it).

Personally I think it was a mistake to register a new content-type for
XHTML to begin with - it introduced an unnecessary schism between HTML
and XHTML which should have just been a natural progression.

Any XHTML-family language which doesn't use elements from non-XHTML
namespaces and follows a few simple rules for backwards-compatibility in
practise seems to work fine served as text/html.

> If RDFa was taken into HTML5, then parsers should also care about
> non-X documents, which binds HTML to not use these attribute names for
> any future extension (actually, as pointed on Ian's mail referenced
> above, @content is already used on  since HTML4, so this can't
> even be fulfilled).

RDFa's use of @content is compatible with its use in HTML4. No, they are
not identical uses, but they are not inconsistent either. Much like
saying that "I am a human", and "I am a mammal" are not identical
statements, but are consistent.

In HTML4 @content is used on  to indicate a string that parsers
interested in a particular piece metadata should use. In RDFa it is used
in the same way, but allowed globally instead of just on .

-- 
Toby Inkster 


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-20 Thread Toby Inkster
Given that one of the objections people cite with RDFa is complexity,
I'm not sure how this resolves things. It seems twice as complicated to
me. It creates fewer new attributes, true, but number of attributes
themselves don't create much confusion.

e.g. which is a simpler syntax:

http://foo.example.com/";
   ping="http://tracker.example.com/";>Foo

or:

http://foo.example.com/');
 secondary:url('http://tracker.example.com/');">Foo

Stuffing multiple discrete pieces of information makes things harder for
parsing, harder for authoring tools and harder for authors. In RDFa,
each attribute performs a simple role - e.g. @rel specifies the
relationship between two resources; @rev specifies the relationship in
the reverse direction; @content allows you to override the
human-readable text of an element. Combining these into a single
attribute would not make things simpler.

Looking at the comparison given in section 4.2, CRDF appears to suffer
from several disadvantages compared to RDFa:

1. It's pretty ugly.

2. It's more verbose - though only by eleven bytes by my reckoning, so
this isn't a major issue.

3. It divorces the CURIE prefix definitions from the use of CURIEs in
the markup. This makes it more vulnerable to copy-paste problems. (As I
understand  in the proposal, CURIE prefix
definitions can even be separated out into an external file. This
obscures them greatly and will certainly be a cause of copy-paste
issues!)

4. It's ugly. I'm sorry, I just can't emphasise that enough.

Apart from the fact that *sometimes* RDFa involves a bit of repetition,
I don't see what problems this proposal is actually supposed to solve.

Repetition in practise seems to be something that page authors can deal
with. We don't provide a mechanism for setting the src or alt attributes
of multiple  elements which need to load the external image; or
setting the class attribute of the third cell in every row of a table.

So again, while I can see that this proposal would "work", in what way
is it supposed to be preferable to RDFa?

-- 
Toby Inkster 


Re: [whatwg] Link rot is not dangerous

2009-05-16 Thread Toby Inkster
Philip Taylor wrote:

> The source data is the list of common RDF namespace URIs at
> http://ebiquity.umbc.edu/resource/html/id/196/Most-common-RDF-namespaces
> from three years ago. Out of those 284:
>  * 56 are 404s. (Of those, 37 end with '#', so that URI itself really
> ought to exist. In the other cases, it'd be possible that only the
> prefix+suffix URIs are meant to exist. Some of the cases are just
> typos, but I'm not sure how many.)
>  * 2 are Forbidden. (Of those, 1 looks like a typo.)
>  * 2 are Bad Gateway.
>  * 22 could not connect to the server. (Of those, 2 weren't http://
> URIs, and 1 was a typo. The others represent 13 different domains.)

While this analysis is interesting, looking at the 56 which 404, it
doesn't seem like a massive loss to me. Some of them are clearly typos
(e.g. DOAP and RSS syndication which are both on HTTP 200 and HTTP 3xx
lists in their correct form). In many cases I think you'll find that
it's not that the link has "rotted" with time, but that there was
*never* a file at the other end.

Even the ones which are genuinely lost are probably only used by a
handful of people. The *really* commonly used URIs - RDF, RDFS, OWL,
FOAF, Dublin Core (1.1 and Terms), RSS (1.0, plus commonly used
modules), SKOS, SIOC, dbpedia, geo, Geonames, vCard and iCalendar - all
seem to have been pretty stable so far.

Judging the stability of RDF URIs by looking at the 284 most common
namespace URIs is akin to judging the provision of light rail in British
cities by looking at the UK's 284 most populated areas - the results
would actually be more helpful if you restricted yourself to a smaller
sample.

Lastly, the RDF model tends to be very resilient against loss of
information anyway. Generally, data tends to be structured such that if
a collection of triples is true, any subset is also true. So if the
meaning of certain triples within a document is lost because of link
rot, the document as a whole will probably still be useful.

-- 
Toby Inkster 


Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-13 Thread Toby Inkster
Leif Halvard Silli wrote:

> Hear hear.  Lets call it "Cascading RDF Sheets".

http://buzzword.org.uk/2008/rdf-ease/spec

http://buzzword.org.uk/2008/rdf-ease/reactions

I have actually implemented it. It works. RDFa is better though.

-Toby