Re: [whatwg] Trying to work out the problems solved by RDFa

Calogero Alex Baldacchino Fri, 09 Jan 2009 12:08:14 -0800

Julian Reschke ha scritto:

Calogero Alex Baldacchino wrote:
...
This is why I was thinking about somewhat "data-rdfa-about","data-rdfa-property", "data-rdfa-content" and so on, so that, for thepurposes of an RDFa processor working on top of HTML5 UAs (perhaps ina test phase, if needed at all, of course), an element dataset wouldgive access to "rdfa-about", instead of just "about", that is usingthe prefix "rdfa-" as acting as a namespace prefix in xml (hence, asif there were "rdfa:about" instead of "data-rdfa-about" in the markup).
...
That clashed with the documented purpose of data-*.

Hmm, I'm not sure there is a clash, since I was suggesting a *custom*and essentially *private* mechanism to experiment with RDFa inconjunction with HTML serialization, for the *small-scale* needs of someorganizations willing to embed RDFa metadata in text/html documents, andto exchange them with each other by using a convention likely avoidingname clashes with other private metadata. Since I think it's unlikely tofind data-rdfa-* used with different semantics in the very same page,and in a small-scale scenario involving a few *selected* sources forRDFa-modelled information, it should be likely to know in advance thatsomeone else is using the same conventions. Such a modelled documentmight be used in conjunction with an external RDFa processor, thusavoiding any direct support in a browser.

However, such a convention might be enough "clash-free" to work on awider scale, thus it might become widespread and provide an evidencethat the web /needs/, or at least /has chosen/ to use RDFa as (one of)the most common way to embed metadata in a document, and such might beenough to add a native support for the whole range of "RDFa" attributes,eventually along with support for earlier experimental ones (such as"data-rdfa-*" and "rdfa:*" ones, for backward compatibility). Andactually I can't see much of a problem if a private-born feature becamethe base of a widespread and widely accepted convention (I'm not sayingthe spec should name data-rdfa-* as a mean to implement RDFa, instead Ithink that, if a general agreement on if and how RDFa must be spec'edout and implemented can't be found, such an experiment might be proposedto the semantic web industry and wait for the results - given a lack insupport might prevent any interested party to use RDFa and HTML5altogether).

*If* we want to support RDFa, why not add the attributes the way theyare already named???

For instance, to experiment whether it is worth to change the "if wewant" into "we do want", without requiring an early implementation andspecification, nor relying on if and what a certain browser vendor mightwant to experiment differently from others (such a convention would onlyrequire support for HTML5 datasets and a script or a plugin capable tohandle them as representing RDFa metadata). -- the point here is thatafter introducing data-* attributes as a mean to support customattributes any browser vendors might decide to drop support for otherkind of custom attributes in html serialization (that is, for attributesbeing neither part of the language nor data-* ones), therefore if they(or any of them) decided to avoid to support RDFa attributes until theywere introduced in a specification there might be no mean to experimentwith them (in general, that is cross-browser) without resorting eitherto data-* or to "rdfa:*" (the latter in xhtml).

Anyway, /in general/ what should a browser do with RDFa metadata, on a*wide scale*, other than classifying a portion of the open web (e.g. inits local history), eventually allowing users to select trusted sources?

Actually, I don't think such would bring enough benefits for *average*users, compared to the risk to get a lot of spam metadata from/heterogeneous/ sources. I really don't expect average users tounderstand how to filter sites basing on metadata reliability (and justfor the purpose to use a metadata-based query interface, because a sitewith wrong metadata might still contain usefull informations); insteadthey might just try and use a query interface the same way they use adefault search bar, get wrong results (once spam metadata becamewidespread) and decide the mechanism doesn't work fine (eventuallycomplaining for that). A somewhat antispam filter might help, but Ithink that understanding if metadata are reliable, that is if theyreally correspond to a web page content, is an odd problem to be solvedby a bot without a good degree of Artificial Intelligence (filteringemails by looking for suspicious patterns is far easier thanimplementing a filter capable to /understand/ metadata, /understand/natural language and compare /semantics/ ).

As well, I don't expect the great majority of web pages to contain"valid" metadata: most people would not care of them, and a potentiallygrowing number might copy&paste code containing metadata from othersites as a kind of template, then edit the content and ignore anymetadata, thus breaking reliability. I do think wide-scale use ofmetadata coming from heterogeneous sources can be more harmful thanuseful. *If* we do agree that small-scale needs is the main contextwhere RDFa can bring benefits, perhaps a custom mechanism and externalplugins are all we need; otherwise, it should be proved that /misused/and /abused/ metadata can be filtered out *easily* and *automatically*,without requiring average users to understand the problem, nor affectingthe overall efficiency. IMHO.

...
However, AIUI, actual xml serialization (xhtml5) allows the use ofnamespaces and prefixed attributes, thus couldn't a proper namespacebe introduced for RDFa attributes, so they can be used, if needed, inxhtml5 documents? I think such might be a valuable choice, because itseems to me RDFa attributes can be used to address such cases wheremetadata must stay as close as possible to correspondent data, but amistake in a piece of markup may trigger the adoption agency orfoster parenting algorithms, eventually causing a separation betweenmetadata and content, thus possibly breaking reliability of gatheredinformations. From this perspective, a parser stopping on the veryfirst error might give a quicker feedback than one rearrangingmisnested elements as far as it is reasonably possible (notaffecting, and instead improving, content presentation and users'"direct" experience, but possibly causing side-effects with metadata).
...
That would make RDFa as used in XHTML 1.* and RDFa used in HTML 5incompatible. What for?
> ...

BR, Julian

Because I'm not sure RDFa can work fine with HTML serialization. Toclarify that, let me take and modify an example from W3C Recommendation(without pretending it to be a good example to build a good worst-casescenario, but just to give an idea):


[...]
<p>
  I'm holding
  <span property="cal:summary">
    one last summer Barbecue
  </span>, to meet friends and have a party before the end of holidays
  on
  <span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
        datatype="xsd:dateTime">
    September 16th at 4pm
  </span>.
</p>
[...]


Now let consider it written as:

[...]
<p>
 I'm holding
 <span property="cal:summary">
   one last summer Barbecue
<!-- now the </span> close tag is missing here -->,
 to meet friends and have a party before the end of holidays
 on
 <span property="cal:dtstart" content="2007-09-16T16:00:00-05:00"
       datatype="xsd:dateTime">
   September 16th at 4pm
 </span>.
</p>
[...]

The above would result in a parse error as an xml-serialized document,since the document isn't well formed. Instead, as part of anhtml-serialized document, the above fragment would be processed anyway,improving users' experience (with respect to a page stopping renderingon a missing close tag), but potentially causing metadata to beimprecisely binded to any data, thus potentially harming automated dataextraction (for some purpose). Therefore, perhaps using such metadataonly inside xml serialized pages might give a quick feedback on such aproblem as soon as the author checked a page appearance (which I thinkwould be the very first check, as well as I think about no one wouldcheck the _whole_ range of possible queries people might make over adocument, to look for errors).

*If* this is meaningful, supporting RDFa attributes as "rdfa:*" mightensure that xml serialization is preferred by people really needing touse this kind of metadata (while leaving a chance to experiment RDFawith html serialization, because no one can be prohibited to usedata-<prefix>-* for this purpose beside a proper script or plugin),whereas introducing "about", "property", "content", "datatype" and so ondirectly in html namespace, as attributes shared by all elements, wouldmake the choice of one serialization or the other indifferent, thusleading to every possible side-effects html serialization may cause.

As a side note, It seems that people from the W3C are evaluating aresort to extensibility to introduce RDFa attributes into xml-serializedhtml documents, and they also have some doubts whether allow use of RDFaattributes within html serialization or not:

"The HTML WG is encouraged to provide a mechanism to permitindependently developed vocabularies such as Internationalization TagSet (ITS), Ruby, and RDFa to be mixed into HTML documents. /Whether thisoccurs through the extensibility mechanism of XML, *whether it is alsoallowed in the classic HTML serialization*, and whether it uses the DTDand Schema modularization techniques/, is for the HTML WG to determine."

(from <http://www.w3.org/2007/03/HTML-WG-charter#deliverables>)

WBR, Alex


--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP 
autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Meetic: il leader italiano ed europeo per trovare l'anima gemella online. 
Provalo ora
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8291&d=9-1

Re: [whatwg] Trying to work out the problems solved by RDFa

Reply via email to