Charles McCathieNevile ha scritto:
The results of the first set of Microformats efforts were some pretty
cool applications, like the following one demonstrating how a web
browser could forward event information from your PC web browser to your
phone via Bluetooth:

http://www.youtube.com/watch?v=azoNnLoJi-4

It's a technically very interesting application. What has the adoption
rate been like? How does it compare to other solutions to the problem,
like CalDav, iCal, or Microsoft Exchange? Do people publish calendar
events much? There are a lot of Web-based calendar systems, like MobileMe
or WebCalendar. Do people expose data on their Web page that can be used
to import calendar data to these systems?

In some cases this data is indeed exposed to Webpages. However, anecdotal evidence (which unfortunately is all that is available when trying to study the enormous collections of data in private intranets) suggests that this is significantly more valuable when it can be done within a restricted access website.

...
In short, RDFa addresses the problem of a lack of a standardized
semantics expression mechanism in HTML family languages.

A standardized semantics expression mechanism is a solution. The lack of a solution isn't a problem description. What's the problem that a
standardized semantics expression mechanism solves?

There are many many small problems involving encoding arbitrary data in pages - apparently at least enough to convince you that the data-* attributes are worth incorporating.

There are many cases where being able to extract that data with a simple toolkit from someone else's content, or using someone else's toolkit without having to tell them about your data model, solves a local problem. The data-* attributes, because they do not represent a formal model that can be manipulated, are insufficient to enable sharing of tools which can extract arbitrary modelled data.


That's because the data-* attributes are meant to create custom models for custom use cases not (necessarily) involving interchange and (let me say) "agnostic extraction" of data. However, data-* attributes might be used to "emulate" support for RDFa attributes, so that each one might be mapped to, let's say, a "data-rdfa-<attribute>" one and viceversa (I don't think "data-rdfa-about" vs "about" would make a great difference, at least in a test phase, since it wouldn't be much different from "rdfa:about", which might be used to embed RDFa attributes in a somewhat xml language (e.g. an "external" markup embedded in a xhtml document through the extension mechanism)).

Since it seems there are several problems which may be addressed (beside other, more custom models) by RDFa for organization-wide internal use and intranet publication, without the explicit requirement of external interchange, when both HTML5 specific features and RDFa attributes are felt as necessary, it shouldn't be too difficoult to create a custom parser, comforming to RDFa spec and availing of data-* attributes, to be plugged in a certain browser supporting html5 (and data-*) for internal test first, then exposed to the community, so that html5+rdfa can be tested on a wider scale (especially once alike parsers are provided for all main browsers), looking for a widespread adoption to point out an effective need to merge RDFa into HTML5 spec (or to standardize an approach based on data-* attributes).

That is, since RDFa can be "emulated" somehow in HTML5 and tested without changing current specification, perhaps there isn't a strong need for an early adoption of the former, and instead an "emulated" mergence might be tested first within current timeline.

What is the cost of having different data use specialised formats?

If the data model, or a part of it, is not explicit as in RDF but is implicit in code made to treat it (as is the case with using scripts to process things stored in arbitrarily named data-* attributes, and is also the case in using undocumented or semi-documented XML formats, it requires people to understand the code as well as the data model in order to use the data. In a corporate situation where hundreds or tens of thousands of people are required to work with the same data, this makes the data model very fragile.


I'm not sure RDF(a) solves such a problem. AIUI, RDFa just binds (xml) properties and attributes (in the form of curies) to RDF concepts, modelling a certain kind of relationships, whereas it relies on external schemata to define such properties. Any undocumented or semi-documented XML formats may lead to misuses and, thus, to unreliably modelled data, and it is not clear to me how just creating an explicit relationship between properties is enough to ensure that a property really represents a subject and not a predicate or an object (in its wrongly documented schema), if the problem is the correct definition of the properties themselves. Perhaps it is enough to parse them, and perhaps it can "inspire" a better definition of the external schemata (if the RDFa "vision" of data as triples is suitable for the effective data to model), but if the problem is the right understanding of "what represents what" because of a lack in documentations, I think that's something RDF/RDFa can't solve.

I think the same applies to data-* attributes, because _they_ describe data (and data semantics) in a custom model and thus _they_ need to be documented for others to be able to manipulate them; the use of a custom script rather than a built-in parser does not change much from this point of view.


[not clear what the context was here, so citing as it was]
> I don't think more metadata is going to improve search engines. In
> practice, metadata is so highly gamed that it cannot be relied upon.
> In fact, search engines probably already "understand" pages with far
> more accuracy than most authors will ever be able to express.

You are correct, more erroneous metadata is not going to improve search
engines. More /accurate/ metadata, however, IS going to improve search
engines. Nobody is going to argue that the system could not be gamed. I
can guarantee that it will be gamed.

However, that's the reality that we have to live with when introducing
any new web-based technology. It will be mis-used, abused and corrupted.
The question is, will it do more good than harm? In the case of RDFa
/and/ Microformats, we do think it will do more good than harm.

For search engines, I am not convinced. Google's experience is that
natural language processing of the actual information seen by the actual
end user is far, far more reliable than any source of metadata. Thus from
Google's perspective, investing in RDFa seems like a poorer investment
than investing in natural language processing.

Indeed. But Google is something of an edge case, since they can afford to run a huge organisation with massive computer power and many engineers to address a problem where a "near-enough" solution brings themn the users who are in turn the product they sell to advertisers. There are many other use cases where a small group of people want a way to reliably search trusted data.


I think the point with general purpose search engines is another one: natural language processing, whereas being expensive, grants a far more accurate solution than RDFa and/or any other kind of metadata can bring to a problem requiring data must never need to be trusted (and, instead, a data processor must be able to determine data's level of trust without any external aid). Since there is no "direct" relationship between the semantics expressed by RDFa and the real semantics of a web page content, relying on RDFa metadata would lead to widespread cheats, as it was when the keywords meta tag was introduced. Thus, a trust chain/evaluation mechanism (such as the use of signatures) would be needed, and so a general purpose search engine relying on RDFa would seem to be working more as a search directory, where human beings analyse content to classify pages, resulting in a more accurate result, but also in a smaller and very slowly growing database of classified sites (since obviously there will always be far more sites not caring of metadata and/or of making their metadata trusted, than sites using trusted RDFa metadata).

(the same reasoning may apply to a local search made by a browser in its local history: results are reliable as far as the expressed semantics is reliable, that is as far as its source is reasonably trusted, which may not be true in general - in general, misuses and deliberate abuses whould be the most common case without a trust evaluation mechanism, which, in turn, would restrict the number of pages where the presence of rdf(a) metadata is really helpful).

My concern is that any data model requiring any level of trust to achieve a good-working interoperability may address very small (and niche) use cases, and even if a lot of such niche use cases might be grouped in a whole category consistently addressed by RDFa (perhaps beside other models), the result might not be an enough significant use case fitting actual specification guidelines (which are somehow hostile to (xml) extensibility, as far as I've understood them) -- though they might be changed when and if really needed.

Best regards,
Alex


--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP 
autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Con Meetic trovi milioni di single, iscriviti adesso e inizia subito a fare 
nuove amicizie
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8290&d=3-1

Reply via email to