Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)
On Fri, 2009-05-22 at 12:26 +0200, Eduard Pascual wrote: > Are you calling the DOM Consistency Principle a "theoretical" or > "aesthetic" argument? Certainly not -- DOM consistency is a great idea. But given that the HTML5 spec defines how the DOM is built, there's a very simple solution to that -- HTML5 could simply mandate that: http://foo.example.com/";> generates an identical DOM representation in both XHTML5 and HTML5. What's the problem with that? In existing implementations, there are differences, sure. But for the most part, those differences are pretty small and obscure, and don't actually effect real world code very much. e.g. the following code seems to work fine in Opera, Firefox and Midori (a Webkit browser): http://buzzword.org.uk/2009/dom.html http://buzzword.org.uk/2009/dom.xhtml The files are byte-for-byte identical (indeed, on disk, one is just a symlink to the other). -- Toby Inkster
Re: [whatwg] Removing the need for separate feeds
Eduard Pascual wrote: > For manually authored pages and feeds things would be different; but > are there really a significant ammount of such cases out there? I > can't say I have seen the entire web (who can?), but among what I have > seen, I have never encountered any hand authored feed, except for code > examples and similar "experimental" stuff. Surely this proves the need for a way of extracting feeds from HTML? You never see manually written feeds because people can't be bothered to manually write feeds. So the people who manually author HTML simply don't bother providing feeds at all. If an HTML page can *be* a feed, this allows manually authored HTML pages to be subscribed to in feed readers. -- Toby Inkster
Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)
ignored. > Actually, > there have been some complains [1] about why should HTML5 restraint > itself from using quite useful attribute names such as "content" or > "resource", just because RDFa decided to use them, without giving > non-X HTML a thought. Attribute names are not a scarce commodity. Just using the 26 letters of the English alphabet (I avoid calling it the "Latin alphabet" given that three of the letters are post-Roman inventions) you can create about 10 million different 5-letter attribute names. Certainly most of them are nonsensical, but there are an awful lot of attribute names to choose from, so it doesn't make sense to introduce potentially harmful clashes where they could be avoided. You beg the question of whether the RDFa task force invented attributes without giving HTML a thought. Certainly RDFa's XHTML 2.0 heritage is clear, but the language employed by the RDFa syntax document appears very carefully chosen to accommodate HTML. The processing sequence is defined in very DOM-like terms, making it easy to carry out on any DOM tree without having to worry about the serialisation that the DOM tree was built from. As another example of its neutral stance, it says that language information "can" be provided using xml:lang, but doesn't appear to rule out other mechanisms for declaring language. The only aspect of RDFa which doesn't sit especially well in HTML is CURIE prefix mappings, which use xmlns:* attributes. In practice, it doesn't seem to have proved a difficulty to those of us who have implemented support for RDFa in HTML, but there are theoretical and aesthetic arguments against it. But this is a small issue which is not especially difficult to fix, and there's no reason to throw the baby out with the bathwater. Various solutions to it are being discussed both here and on the public-rdf-in-xhtml...@w3.org list. > In other words: currently, RDFa parsers should have enough to ignore > non-X HTML content (or, more specifically, documents with no default > xmlns in , so they can also cope with the XHTML1.1+RDFa served > as text/html aberration, which is wrong no matter how you look at it). Personally I think it was a mistake to register a new content-type for XHTML to begin with - it introduced an unnecessary schism between HTML and XHTML which should have just been a natural progression. Any XHTML-family language which doesn't use elements from non-XHTML namespaces and follows a few simple rules for backwards-compatibility in practise seems to work fine served as text/html. > If RDFa was taken into HTML5, then parsers should also care about > non-X documents, which binds HTML to not use these attribute names for > any future extension (actually, as pointed on Ian's mail referenced > above, @content is already used on since HTML4, so this can't > even be fulfilled). RDFa's use of @content is compatible with its use in HTML4. No, they are not identical uses, but they are not inconsistent either. Much like saying that "I am a human", and "I am a mammal" are not identical statements, but are consistent. In HTML4 @content is used on to indicate a string that parsers interested in a particular piece metadata should use. In RDFa it is used in the same way, but allowed globally instead of just on . -- Toby Inkster
Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)
Given that one of the objections people cite with RDFa is complexity, I'm not sure how this resolves things. It seems twice as complicated to me. It creates fewer new attributes, true, but number of attributes themselves don't create much confusion. e.g. which is a simpler syntax: http://foo.example.com/"; ping="http://tracker.example.com/";>Foo or: http://foo.example.com/'); secondary:url('http://tracker.example.com/');">Foo Stuffing multiple discrete pieces of information makes things harder for parsing, harder for authoring tools and harder for authors. In RDFa, each attribute performs a simple role - e.g. @rel specifies the relationship between two resources; @rev specifies the relationship in the reverse direction; @content allows you to override the human-readable text of an element. Combining these into a single attribute would not make things simpler. Looking at the comparison given in section 4.2, CRDF appears to suffer from several disadvantages compared to RDFa: 1. It's pretty ugly. 2. It's more verbose - though only by eleven bytes by my reckoning, so this isn't a major issue. 3. It divorces the CURIE prefix definitions from the use of CURIEs in the markup. This makes it more vulnerable to copy-paste problems. (As I understand in the proposal, CURIE prefix definitions can even be separated out into an external file. This obscures them greatly and will certainly be a cause of copy-paste issues!) 4. It's ugly. I'm sorry, I just can't emphasise that enough. Apart from the fact that *sometimes* RDFa involves a bit of repetition, I don't see what problems this proposal is actually supposed to solve. Repetition in practise seems to be something that page authors can deal with. We don't provide a mechanism for setting the src or alt attributes of multiple elements which need to load the external image; or setting the class attribute of the third cell in every row of a table. So again, while I can see that this proposal would "work", in what way is it supposed to be preferable to RDFa? -- Toby Inkster
Re: [whatwg] Link rot is not dangerous
Philip Taylor wrote: > The source data is the list of common RDF namespace URIs at > http://ebiquity.umbc.edu/resource/html/id/196/Most-common-RDF-namespaces > from three years ago. Out of those 284: > * 56 are 404s. (Of those, 37 end with '#', so that URI itself really > ought to exist. In the other cases, it'd be possible that only the > prefix+suffix URIs are meant to exist. Some of the cases are just > typos, but I'm not sure how many.) > * 2 are Forbidden. (Of those, 1 looks like a typo.) > * 2 are Bad Gateway. > * 22 could not connect to the server. (Of those, 2 weren't http:// > URIs, and 1 was a typo. The others represent 13 different domains.) While this analysis is interesting, looking at the 56 which 404, it doesn't seem like a massive loss to me. Some of them are clearly typos (e.g. DOAP and RSS syndication which are both on HTTP 200 and HTTP 3xx lists in their correct form). In many cases I think you'll find that it's not that the link has "rotted" with time, but that there was *never* a file at the other end. Even the ones which are genuinely lost are probably only used by a handful of people. The *really* commonly used URIs - RDF, RDFS, OWL, FOAF, Dublin Core (1.1 and Terms), RSS (1.0, plus commonly used modules), SKOS, SIOC, dbpedia, geo, Geonames, vCard and iCalendar - all seem to have been pretty stable so far. Judging the stability of RDF URIs by looking at the 284 most common namespace URIs is akin to judging the provision of light rail in British cities by looking at the UK's 284 most populated areas - the results would actually be more helpful if you restricted yourself to a smaller sample. Lastly, the RDF model tends to be very resilient against loss of information anyway. Generally, data tends to be structured such that if a collection of triples is true, any subset is also true. So if the meaning of certain triples within a document is lost because of link rot, the document as a whole will probably still be useful. -- Toby Inkster
Re: [whatwg] Annotating structured data that HTML has no semantics for
Leif Halvard Silli wrote: > Hear hear. Lets call it "Cascading RDF Sheets". http://buzzword.org.uk/2008/rdf-ease/spec http://buzzword.org.uk/2008/rdf-ease/reactions I have actually implemented it. It works. RDFa is better though. -Toby