Re: [whatwg] Link rot is not dangerous (was: Re: Annotating structured data that HTML has nosemanticsfor)

2009-05-16 Thread Toby A Inkster

On 15 May 2009, at 17:20, Manu Sporny wrote:


The argument that link rot would cause massive damage to the semantic
web is just not true. Even if there is minor damage caused, it is  
fairly

easy to recover from it, as outlined above.


I was talking about this recently somewhere (can't remember where).

The RDF model is different from {key:value} models in that it has a  
third component - a subject. This means that while a description for  
 (which I'll refer to as  
'foaf:Person' from now on, for brevity) can be found at the URL  
foaf:Person, it's also possible for descriptions of foaf:Person to be  
found elsewhere.


While the description for foaf:Person at foaf:Person is clearly much  
easier to find than other descriptions for foaf:Person, under the RDF  
model, they are all afforded equal weight.


If foaf:Person disappeared tomorrow, and even if I couldn't find an  
alternative source for that definition, the URI would still not be  
useless. I'd still know, say, that Toby Inkster is a foaf:Person, and  
Manu Sporny is a foaf:Person and from that I'd be able to conclude  
that they're the same sort of thing in some way.


Given enough instance data like that, I might even be able to analyse  
the instance data, looking at what all the instances of foaf:Person  
had in common and rediscover the original definition of foaf:Person.


The ability to dereference an RDF class or property to discover more  
about it is very useful. A data format without that ability is all  
the poorer for not having it. But, when that dereferencing fails, all  
is not lost.


So when in use cases, RDF fans talk about it being 'essential' to be  
able to follow their noses to definitions of terms, what is meant is  
that it's essential that a mechanism exists to enable this technique  
- it not essential that the definitions are always found.


--
Toby A Inkster





Re: [whatwg] Link rot is not dangerous (was: Re: Annotating structured data that HTML has nosemanticsfor)

2009-05-15 Thread Kristof Zelechovski
I understand that there are ways to recover resources that disappear from
the Web; however, the postulated advantage of RDFa "you can go see what it
means" simply does not hold.  The recovery mechanism, Web search/cache,
would be as good for CURIE URL as for domain prefixes.  Creating a redirect
is not always possible and the built-in redirect dictionary (CURIE catalog?)
smells of a central repository.  This is no better than public entity
identifiers in XML.

Serving the vocabulary from the own domain is not always possible, e.g. in
case of reader-contributed content, and only guarantees that the vocabulary
will be alive while it is supported by the domain owner.  (WHATWG wants HTML
documents to be readable 1000 years from now.)  It is not always practical
either as it could confuse URL-based tools that do not retrieve the
resources referenced.

All this does not imply, of course, that RDFa is no good.  It is only
intended to demonstrate that the postulated advantage of the CURIE lookup is
wishful thinking.

Best regards,
Chris



[whatwg] Link rot is not dangerous (was: Re: Annotating structured data that HTML has nosemanticsfor)

2009-05-15 Thread Manu Sporny
Kristof Zelechovski wrote:
> Therefore, link rot is a bigger problem for CURIE
> prefixes than for links.

There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that there
must be a single vocabulary document in existence, for all time, at one
location. This has also lead to a false requirement that all
vocabularies should be centralized.

Here's the fear:

If a vocabulary document disappears for any reason, then the meaning of
the vocabulary is lost and all triples depending on the lost vocabulary
become useless.

That fear ignores the fact that we have a highly available document
store available to us (the Web). Not only that, but these vocabularies
will be cached (at Google, at Yahoo, at The Wayback Machine, etc.).

IF a vocabulary document disappears, which is highly unlikely for
popular vocabularies - imagine FOAF disappearing overnight, then there
are alternative mechanisms to extract meaning from the triples that will
be left on the web.

Here are just two of the possible solutions to the problem outlined:

- The vocabulary is restored at another URL using a cached copy of the
vocabulary. The site owner of the original vocabulary either re-uses the
vocabulary, or re-directs the vocabulary page to another domain
(somebody that will ensure the vocabulary continues to be provided -
somebody like the W3C).
- RDFa parsers can be given an override list of legacy vocabularies that
will be loaded from disk (from a cached copy). If a cached copy of the
vocabulary cannot be found, it can be re-created from scratch if necessary.

The argument that link rot would cause massive damage to the semantic
web is just not true. Even if there is minor damage caused, it is fairly
easy to recover from it, as outlined above.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/