[whatwg] Link rot is not dangerous

2009-05-15 Thread Leif Halvard Silli

Geoffrey Sneddon Fri May 15 14:27:03 PDT 2009


On 15 May 2009, at 18:25, Shelley Powers wrote:

> One of the very first uses of RDF, in RSS 1.0, for feeds, is still  
> in existence, still viable. You don't have to take my word, check it  
> out yourselves:

>
> http://purl.org/rss/1.0/

Who actually treats RSS 1.0 as RDF? Every major feed reader just uses  
a generic XML parser for it (quite frequently a non-namespace aware  
one) and just totally ignores any RDF-ness of it.


What does it mean to "treat as RDF"? An "RSS 1.0" feed is essentially a 
stream of "items" that has been lifted from the page(s) and placed in an 
RDF/XML feed. When I read e.g. 
http://www.w3.org/2000/08/w3c-synd/home.rss in Safari, I can sort the 
news items according to date, source, title. Which means - I think - 
that Safari sees the feed as "machine readable".  It is certainly 
possible to do more - I guess, and Safari does the same to non-RDF 
feeds, but still. And search engines should have the same opportunities 
w.r.t. creating indexes based on "RSS 1.0" as on RDFa. (Though here 
perhaps comes in between the fact that search engines prefers to help us 
locate HTML pages rather than feeds.)

--
leif halvard silli


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Tab Atkins Jr.
2009/5/15 Laurens Holst :
> Tab Atkins Jr. schreef:
>>
>> Assume a page that uses both foaf and another vocab that subclasses
>> many foaf properties.  Given working lookups for both, the rdf parser
>> can determine that two entries with different properties are really
>> 'the same', and hopefully act on that knowledge.
>>
>> If the second vocab 404s, that information is lost.  The parser will
>> then treat any use of that second vocab completely separately from the
>> foaf, losing valuable semantic information.
>>
>
> If the subclass-vocabulary is public, then it is most likely already well
> taken care of by the owner and also archived in several places, and thus
> hard to get lost. If the subclass-vocabulary is one custom-built for a
> specific site, then it is likely already stored in the same location.
>
> But even if you had RDF data without ontology, it is still far from useless.
> In fact, I’d say most RDF consumers today do not really do any kind of
> reasoning, which is what you primarily need an ontology for, especially not
> the large consumers. Without ontology you can still determine types, query
> their properties whose names are often self-explanatory, compare resources
> for equality, etc.
>
> Knowledge of the ontology will be embedded in documentation and existing
> software that consumes the data. Let me remark that when you end up in this
> scenario, you still basically got the same as what microformats have to work
> with. And if need be, you could even manually construct a schema.
>
> But yes, if everything goes awry, then data can get lost. That is the nature
> of the web. It is like, if snap.com goes out of business, all sites using
> those annoying popups will cease to show them (hurray!). A question you
> could pose is, if ‘the web’ allowed the data to get lost, whether that data
> is really important anyway.
>
> Maybe it would ease your mind if people set up a bunch of servers which
> spider the web of data for ontology schemas, archives them and provides a
> querying mechanism? If such a thing does not exist already.
>
> Either way, I guess kind of the basic idea is that, dereferencibility of RDF
> URIs is a convenient bonus, not a necessity, RDF can work completely
> offline. There is no requirement that ontologies must be retrieved from the
> ontology’s URIs or that there must be an ontology at all.

Believe me, Laurens, *I* know this.  I know that public vocabs will be
publicly known and consumable, and private vocabs don't need to be
(because the few people using them know them and can consume them).
But the automated discoverability of RDF has been touted as a major
reason why RDFa specifically has to be supported in HTML5 (certainly
not the only major reason, but it's been harped on plenty), and link
rot *does* significantly affect that, *especially* for the small
vocabs that aren't likely to be widely reproduced.  It's a common
thing that *will* happen, as Philip's data shows, and as anyone
familiar with web history is aware of.  The web rots over time, no
matter what you do, and there's no way to form canonical identifiers
that will stand the test of time.

Automated discovery is a benny in RDF's favor.  It's probably not a
*downside*, after all (though there were some negative scenarios
brought up concerning this a few months ago, such as a domain falling
into new hands who maliciously modify the schema).  But I think it's a
very minor point, and the fact that few if any major consumers of RDF
actually use this ability supports this thought.  There's little to no
in-the-wild use cases for this sort of ability, which means that it is
very low priority when determining what solution will be specced.

Once you remove discovery as a strong requirement, then you remove the
need for large urls, and that removes the need for CURIEs, or any
other form of prefixing.  You still want to uniquify your identifiers
to avoid accidental clashes, but that's not that hard, nor is it
absolutely necessary.  The system can be robust and usable even with a
bit of potential ambiguity if small authors design their private
vocabs badly.  As a bonus, everything gets simpler.  Essentially it
devolves into something relatively close to Ian's microdata proposal,
perhaps with datatype added in (though I do question how necessary
that is, given a half-intelligent parser can recognize things as
numbers or dates).

~TJ


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Geoffrey Sneddon


On 15 May 2009, at 18:25, Shelley Powers wrote:

One of the very first uses of RDF, in RSS 1.0, for feeds, is still  
in existence, still viable. You don't have to take my word, check it  
out yourselves:


http://purl.org/rss/1.0/


Who actually treats RSS 1.0 as RDF? Every major feed reader just uses  
a generic XML parser for it (quite frequently a non-namespace aware  
one) and just totally ignores any RDF-ness of it.



--
Geoffrey Sneddon





Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-15 Thread Tab Atkins Jr.
On Thu, May 14, 2009 at 9:50 AM, Eduard Pascual  wrote:
> I have put online a document that describes my idea/proposal for a
> selector-based solution to metadata.
> The document can be found at http://herenvardo.googlepages.com/CRDF.pdf
> Feel free to copy and/or link the file wherever you deem appropriate.
>
> Needless to say, feedback and constructive criticism to the proposal
> is always welcome.
> (Note: if discussion about this proposal should take place somewhere
> else, please let me know.)

Ah, thanks Eduard.  Have you cleaned this up significantly since the
last time this discussion came up?  It seems to read much better now
than before, but it's possible that I was just stupider several months
ago.

As far as I can tell (I am a novice, so YMMV), it conveys everything
that RDFa does, and more specifically, matches RDF-EASE's features.  I
think it has a friendly syntax than RDF-EASE, though, which I think is
tied too much to the exact structure of RDFa.  The author does
acknowledge that he leans directly on RDFa, but I think that's a
mistake - RDFa is designed to deal with the limitations of the
attr/value pairs that you can place on elements.  When you're
designing a new language by itself, you can employ the magic of
syntactic sugar to tighten things up and make them easier and more
expressive.

Frex, RDF-EASE uses -rdf-property to specify what property something
should be, and -rdf-content to specify whether a property should take
its value from the element's content or from an attribute.  This split
is necessary when embedding attributes in HTML, but your proposal
combines those two things into a single line, which I think is much
clearer, and makes it easier to use when specifying multiple
properties.  (Not to mention making inline specification even easier
than RDFa, as you point out.)

I recommend using 'self' as the value for @|subject that corresponds
to a blank node for each matched element.

How would you write the situation where you have two vocabs applying
to content in an intertwined way, with different subjects?  I can't
think of an explicit example right now, but say you had content like
, where  and  are both subjects
using different vocabs, and  has facts about both of them.  It
seems like you can handle this by specifying two separate blocks with
an identical selector but different @|subject rules.  Is this correct?

If so, it seems then that at least one of those @|subject rules would
require either a url(...) or blank(...) value, which limits ones
ability to use this technique on multiple elements on a page.
RDF-EASE uses the nearest-ancestor(selector) functional notation to
indicate these sorts of relationships.

(Ah, here we go, an example:
http://buzzword.org.uk/2008/rdf-ease/spec#ssec-properties--rdf-about
talks about mixing foaf and vcard together, with one scenario matching
what I outlined earlier.)


Your proposal doesn't seem to have a way to specify the datatype
currently.  Since several people have brought up the lack of datatype
as a weakness in Ian's microdata proposal, this may be a weakness.


RDF-EASE allows you to 'reset' elements, *overriding* metadata given
by less-specific selectors rather than just augmenting it.  This does
seem like a nice ability, specifically when you need to provide a
general rule for a particular class, say, and give a slightly
different rule for one of those elements with a particular id.  On the
other hand, you can just write the general rule with :not() to avoid
the more specific element.  I'm not sure whether this is good enough,
or if it really is easier to use something like 'reset'.

~TJ


Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-15 Thread Tab Atkins Jr.
On Wed, May 13, 2009 at 10:04 AM, Leif Halvard Silli  wrote:
> Toby Inkster on Wed May 13 02:19:17 PDT 2009:
>>
>> Leif Halvard Silli wrote:
>>
>> > Hear hear.  Lets call it "Cascading RDF Sheets".
>>
>> http://buzzword.org.uk/2008/rdf-ease/spec
>>
>> http://buzzword.org.uk/2008/rdf-ease/reactions
>>
>> I have actually implemented it. It works.
>
> Oh! Thanks for sharing.

Indeed, RDF-EASE seems fairly nice!

>> RDFa is better though.
>
> What does 'better' mean in this context? Why and how? Because it is easier
> to process? But EASE seems more compatible with microformats, and is
> "better" in that sense.

I'd also like clarification here.  I dislike *all* of the inline
metadata proposals to some degree, for the same reasons that I dislike
inline @style and @onfoo handlers.  A Selector-based way of applying
semantics fits my theoretical needs much better.

> I read all the reactions you pointed to. Some made the claim that EASE would
> move semantics out of the HTML file, and that microformats was better as it
> keeps the semantics inside the file. But I of course agree with you that
> EASE just underline/outline the semantics already in the file.

Yup.  The appropriate critique of separated metadata is that the
*data* is moved out of the document, where it will inevitably decay
compared to the live document.  RDF-EASE keeps all the data stored in
the live document, and merely specifies how to extract it.  The only
way you can lose data then is by changing the html structure itself,
which is much less common than just changing the content.

> From the EASE draft:
>>
>> All properties in RDF-EASE begin with the string -rdf-, as per §4.1.2.1
>> Vendor-specific extensions in [CSS21]. This allows RDF-EASE and CSS to be
>> safely mixed in one file, [...]
>
> I wonder why you think it is so important to be able to mix CSS and EASE. It
> seems better to separate the two completely.

I'm not thrilled with the mixture of CSS and metadata either.  Just
because it uses Selectors doesn't mean it needs to be specifiable
alongside CSS.  jQuery uses Selectors too, but it stays where it
belongs.  ^_^  (That being said, there's a plugin for it that allows
you to specify js in your CSS, and it gets applied to the matching
elements from the block's selector.)

~TJ


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Kristof Zelechovski
The problem of cybersquatting of oblique domains is, I believe, described
and addressed in tag URI scheme definition [RFC4151], which I think is
something rather similar to the constructs used for HTML microdata.  I think
that document is relevant not only to this discussion but to the whole
concept.
IMHO,
Chris




Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Shelley Powers

Philip Taylor wrote:

On Fri, May 15, 2009 at 6:25 PM, Shelley Powers
 wrote:
  

The most important point to take from all of this, though, is that link rot
within the RDF world is an extremely rare and unlikely occurrence.



That seems to be untrue in practice - see
http://philip.html5.org/data/rdf-namespace-status.txt

The source data is the list of common RDF namespace URIs at
http://ebiquity.umbc.edu/resource/html/id/196/Most-common-RDF-namespaces
from three years ago. Out of those 284:
 * 56 are 404s. (Of those, 37 end with '#', so that URI itself really
ought to exist. In the other cases, it'd be possible that only the
prefix+suffix URIs are meant to exist. Some of the cases are just
typos, but I'm not sure how many.)
 * 2 are Forbidden. (Of those, 1 looks like a typo.)
 * 2 are Bad Gateway.
 * 22 could not connect to the server. (Of those, 2 weren't http://
URIs, and 1 was a typo. The others represent 13 different domains.)

(For the URIs which returned Redirect responses, I didn't check what
happens when you request the URI it redirected to, so there may be
more failures.)

Over a quarter of the most common namespace URIs don't resolve
successfully today, and most of those look like they should have
resolved when they were originally used, so link rot seems to be
common.

(Major vocabularies like RSS and FOAF are likely to exist for a long
time, but they're the easiest cases to handle - we could just
pre-define the prefixes "rss:" and "foaf:" and have a centralised
database mapping them onto schemas/documentation/etc. It seems to me
that URIs are most valuable to let any tiny group make one for their
rarely-used vocabulary, and be guaranteed no name collisions without
needing to communicate with a centralised registry to ensure
uniqueness; but it's those cases that are most vulnerable to link rot,
and in practice the links appear to fail quite often.)

(I'm not arguing that link rot is dangerous - just that the numbers
indicate it's a common situation rather than an extremely rare
exception.)

  
Philip, I don't think the occurrence of link rot causing problems in the 
RDF world is all that common, but thanks for looking up this data. 
Actually I will probably quote your info on my next writing at my weblog.


I'd like to be dropped from any additional emails in this thread. After 
all, I  have it on good authority I'm not open for rational discussion. 
So I'll leave this type of thing to you guys.


Thanks

Shelley


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Tab Atkins Jr.
On Fri, May 15, 2009 at 1:32 PM, Manu Sporny  wrote:
> Tab Atkins Jr. wrote:
>> Reversed domains aren't *meant* to link to anything.  They shouldn't
>> be parsed at all.  They're a uniquifier so that multiple vocabularies
>> can use the same terms without clashing or ambiguity.  The Microdata
>> proposal also allows normal urls, but they are similarly nothing more
>> than a uniquifier.
>>
>> CURIEs, at least theoretically, *rely* on the prefix lookup.  After
>> all, how else can you tell that a given relation is really the same
>> as, say, foaf:name?  If the domain isn't available, the data will be
>> parsed incorrectly.  That's why link rot is an issue.
>
> Where in the CURIE spec does it state or imply that if a domain isn't
> available, that the resulting parsed data will be invalid?

Assume a page that uses both foaf and another vocab that subclasses
many foaf properties.  Given working lookups for both, the rdf parser
can determine that two entries with different properties are really
'the same', and hopefully act on that knowledge.

If the second vocab 404s, that information is lost.  The parser will
then treat any use of that second vocab completely separately from the
foaf, losing valuable semantic information.

(Please correct any misunderstandings I may be operating under; I'm
not sure how competent parsers currently are, and thus how much they'd
actually use a working subclassed relation.)

~TJ


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Philip Taylor
On Fri, May 15, 2009 at 6:25 PM, Shelley Powers
 wrote:
> The most important point to take from all of this, though, is that link rot
> within the RDF world is an extremely rare and unlikely occurrence.

That seems to be untrue in practice - see
http://philip.html5.org/data/rdf-namespace-status.txt

The source data is the list of common RDF namespace URIs at
http://ebiquity.umbc.edu/resource/html/id/196/Most-common-RDF-namespaces
from three years ago. Out of those 284:
 * 56 are 404s. (Of those, 37 end with '#', so that URI itself really
ought to exist. In the other cases, it'd be possible that only the
prefix+suffix URIs are meant to exist. Some of the cases are just
typos, but I'm not sure how many.)
 * 2 are Forbidden. (Of those, 1 looks like a typo.)
 * 2 are Bad Gateway.
 * 22 could not connect to the server. (Of those, 2 weren't http://
URIs, and 1 was a typo. The others represent 13 different domains.)

(For the URIs which returned Redirect responses, I didn't check what
happens when you request the URI it redirected to, so there may be
more failures.)

Over a quarter of the most common namespace URIs don't resolve
successfully today, and most of those look like they should have
resolved when they were originally used, so link rot seems to be
common.

(Major vocabularies like RSS and FOAF are likely to exist for a long
time, but they're the easiest cases to handle - we could just
pre-define the prefixes "rss:" and "foaf:" and have a centralised
database mapping them onto schemas/documentation/etc. It seems to me
that URIs are most valuable to let any tiny group make one for their
rarely-used vocabulary, and be guaranteed no name collisions without
needing to communicate with a centralised registry to ensure
uniqueness; but it's those cases that are most vulnerable to link rot,
and in practice the links appear to fail quite often.)

(I'm not arguing that link rot is dangerous - just that the numbers
indicate it's a common situation rather than an extremely rare
exception.)

-- 
Philip Taylor
exc...@gmail.com


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Shelley Powers

Kristof Zelechovski wrote:

Classes in com.sun.* are reserved for Java implementation details and should
not be used by the general public.  CURIE URL are intended for general use.

So, I can say "Well, it is not the same", because it is not.

Cheers,
Chris


  
But we're not dealing with Java anymore. We're dealing with using 
reversed DNS concatenated with some kind of default URI, to create some 
kind of bastardized URL, which actually is valid, though incredibly 
painful to see, and can be implied to actually take one to to a web address.


You don't have to take my word for it -- check out Philip's testing demo 
for microdata. You get triples with the following:


http://www.w3.org/1999/xhtml/custom#com.damowmow.cat

http://philip.html5.org/demos/microdata/demo.html#output_ntriples

Not only do you face problems with link rot, you also face a significant 
amount of confusion, as people look at that and go, "What the hell is 
that?"


Oh, and you can say, "Well, but we don't _mean_ anything by it" -- but 
what does that have to do with anything? People don't go running the 
spec everytime they see something. They look at this thing and think, 
"Oh, a link. I wonder where it goes." You go ahead and try it, and 
imagine for a moment the confusion when it goes absolutely no where. 
Except that I imagine the W3C folks are getting a little annoyed with 
the HTML WG now, for allowing this type of thing in, generating a whole 
bunch of 404 errors for the web master(s).


But hey, you've given me another idea. I think I'll create my own 
vocabulary items, with the reversed DNS 
http://www.w3.org/1999/xhtml/custom#com.sun.*. No, maybe 
http://www.w3.org/1999/xhtml/custom#com.opera.*. Nah, how about 
http://www.w3.org/1999/xhtml/custom#com.microsoft.*. Yeah, that's cool. 
And there is no mechanism is place to prevent this, because unlike 
"regular" URIs, where the domain is actually controlled by specific 
entity, you've created the world famous W3C fudge pot. Anything goes.


I can't wait for the lawsuits on this one. You think that cybersquatting 
is an issue on the web, or facebook, or Twitter, wait until you see 
people use com.microsoft.*.


Then there's the vocabulary that was created by foobar.com, that people 
think, "Hey, cool, I'll use that...whatever it is". After all, if you 
want to play with the RDF kids, your vocabularies have to be usable by 
other people.


But Foobar takes a dive in the dot com pool, and foobar.com gets taken 
over by a porn establishment. Yeah, I can't wait for people to explain 
that one to the boss. Just because it doesn't link, won't mean it won't 
end up on Twitter as a big, huge joke.


If you want to find something to criticize, I think it's important to 
realize that hey, folks, you've just stepped over the line, and you're 
now in the Zone of Decentralization. Whatever impacts us, babes, impacts 
all of you. Because if you look at Philip's example, you're going to see 
the same set of vocabulary URIs we're using for RDF right now, as 
microdata uses our stuff, too. Including the links that are all 
trembling on the edge on the self-implosion.


So the point of all of this is moot.

But it was fun. Really fun. Have a great weekend.

Shelley


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Kristof Zelechovski
Serving the RDFa vocabulary from the own domain is not always possible, e.g.
when a reader of a Web site is encouraged to post a comment to the page she
reads and her comment contains semantic annotations.

The probability of a URL becoming unavailable is much greater than that of
both mirrored drives wearing out at the same time.  (data mirroring does not
claim it protects from fire, water, high voltage, magnetic storms,
earthquakes and the like; it only protects you from natural wear.)  The
probability of ultimately losing data stored in one copy is 1; the
probability of a URL going down is close to 1.  So, RAID works in most
cases, CURIE URL do not (ultimately) work in most cases.

Disappearing CSS is not a problem for HTML because CSS does not affect the
meaning of the page.

Disappearing scripts are a problem for HTML but they are not a problem for
HTML *data*.  In other words, script-generated content is not guaranteed to
survive, and there is nothing we can do about that except for a warning.
Such content cannot be HTML-validated either.  In general, scripts are best
used (and intended) for behavior, not for creating content.

External SVG files do not describe existing content, they *are* (embedded)
content.  If a HTML file disappears, it becomes unreadable as well, but that
problem obviously cannot be solved from within HTML :-)

"HTML should be readable in 1000 years from now" was an attempt to visualize
the intention of persistence.  It should not be understood as "best before",
of course.

If the author chooses to create a redirect to a well-known vocabulary using
a dependent vocabulary stored at his own site in order to prevent link rot,
tools that recognize vocabulary URL without reading the corresponding
resources will be unable to recognize the author's intent, and for the tools
that do read the original vocabulary will still be unavailable, so this
method causes more problems than it solves.

Cheers,
Chris




[whatwg] Link rot is not dangerous

2009-05-15 Thread Manu Sporny
Tab Atkins Jr. wrote:
> Reversed domains aren't *meant* to link to anything.  They shouldn't
> be parsed at all.  They're a uniquifier so that multiple vocabularies
> can use the same terms without clashing or ambiguity.  The Microdata
> proposal also allows normal urls, but they are similarly nothing more
> than a uniquifier.
> 
> CURIEs, at least theoretically, *rely* on the prefix lookup.  After
> all, how else can you tell that a given relation is really the same
> as, say, foaf:name?  If the domain isn't available, the data will be
> parsed incorrectly.  That's why link rot is an issue.

Where in the CURIE spec does it state or imply that if a domain isn't
available, that the resulting parsed data will be invalid?

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/



Re: [whatwg] Annotating structured data that HTML has no semanticsfor

2009-05-15 Thread Tab Atkins Jr.
On Fri, May 15, 2009 at 9:17 AM, Eduard Pascual  wrote:
> On Fri, May 15, 2009 at 1:44 PM, Kristof Zelechovski
>> Link rot
>>        CURIE definitions can only be looked up while the CURIE server is
>> providing them; the chance of the URL becoming broken is high for
>> home-brewed vocabularies.  While the vocabularies can be moved elsewhere, it
>> will not always be possible to create a redirect.
>
> Oh, and do reversed domains help at all with this? Ok, with CURIEs
> there is a (relatively small) chance for the CURIE to not be
> resolvable at a given time; reversed domains have a 100% chance to not
> be resolvable at any time: there is always, at least, ambiguity: does
> org.example.foo map to foo.example.org, example.org/foo, or
> example.org#foo? Even better: what if, under example.org we find a
> vocabulary at example.org/foo and another at foo.example.org? (Ok,
> that'd be quite unwise, although it might be a legitimate way to keep
> "deployed" and "test" versions of a vocabulary online at a time; but
> anyway CURIEs can cope with it, while reversed domains can't).
> Wherever there are links, there is a chance for broken links: that's
> part of the nature of links, and the evolving nature of the web. But,
> just because the chance of links being broken, would you deny the
> utility of elements such as  and ? Reversed domains don't
> face broken links because they are simply uncapable to link to
> anything.

Reversed domains aren't *meant* to link to anything.  They shouldn't
be parsed at all.  They're a uniquifier so that multiple vocabularies
can use the same terms without clashing or ambiguity.  The Microdata
proposal also allows normal urls, but they are similarly nothing more
than a uniquifier.

CURIEs, at least theoretically, *rely* on the prefix lookup.  After
all, how else can you tell that a given relation is really the same
as, say, foaf:name?  If the domain isn't available, the data will be
parsed incorrectly.  That's why link rot is an issue.

~TJ


Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Manu Sporny
Kristof Zelechovski wrote:
> I understand that there are ways to recover resources that disappear from
> the Web; however, the postulated advantage of RDFa "you can go see what it
> means" simply does not hold. 

This is a strawman argument more below...

> All this does not imply, of course, that RDFa is no good.  It is only
> intended to demonstrate that the postulated advantage of the CURIE
> lookup is wishful thinking.

That train of logic seems to falsely conclude that if something does not
hold true 100% of the time, then it cannot be counted as an advantage.

Example:

Since the postulated advantage of RAID-5 is that a disk array is
unlikely to fail due to a single disk failure, and since it is possible
for more than one disk to fail before a recovery is complete, one cannot
call running a disk array in RAID-5 mode an advantage to not running
RAID at all (because failure is possible).

or

Since the postulated advantage of CURIEs is that "you can go see what it
means" and it is possible for a CURIE defined URL to be unavailable, one
cannot call it an advantage because it may fail.

There are two flaws in the premises and reasoning above, for the CURIE case:

- It is assumed that for something to be called an 'advantage' that it
  must hold true 100% of the time.
- It is assumed that most proponents of RDFa believe that "you can go
  see what it means" holds at all times - one would have to be very
  deluded to believe that.

> The recovery mechanism, Web search/cache,
> would be as good for CURIE URL as for domain prefixes.  Creating a redirect
> is not always possible and the built-in redirect dictionary (CURIE catalog?)
> smells of a central repository. 

Why does having a file sitting on your local machine that lists
alternate vocabulary files for CURIEs smell of a central repository?
Perhaps you're assuming that the file would be managed by a single
entity? If so, it wouldn't need to be and that was not what I was proposing.

> Serving the vocabulary from the own domain is not always possible, e.g. in
> case of reader-contributed content, 

This isn't clear, could you please clarify what you mean by
"reader-contributed content"?

> and only guarantees that the vocabulary
> will be alive while it is supported by the domain owner.

This case and it's solution was already covered previously. Again - if
the domain owner disappears, the domain disappears, or the domain owner
doesn't want to cooperate for any reason, one could easily set up an
alternate URL and instruct the RDFa processor to re-direct any
discovered CURIEs that match the old vocabulary to the new
(referenceable) vocabulary.

> (WHATWG wants HTML documents to be readable 1000 years from now.)  

Is that really a requirement? What about external CSS files that
disappear? External Javascript files that disappear? External SVG files
that disappear? All those have something to do with the document's
human/machine readability. Why is HTML5 not susceptible to link rot in
the same way that RDFa is susceptible to link rot?

Also, why 1000 years, that seems a bit arbitrary? =P

> It is not always practical either as it could confuse URL-based 
> tools that do not retrieve the resources referenced.

Could you give an example of this that wouldn't be a bug in the
dereferencing application? How could a non-dereference-able URL "confuse
URL-based tools"?

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/



Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Kristof Zelechovski
Classes in com.sun.* are reserved for Java implementation details and should
not be used by the general public.  CURIE URL are intended for general use.

So, I can say "Well, it is not the same", because it is not.

Cheers,
Chris



Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Shelley Powers

Dan Brickley wrote:

On 15/5/09 18:20, Manu Sporny wrote:

Kristof Zelechovski wrote:

Therefore, link rot is a bigger problem for CURIE
prefixes than for links.


There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that there
must be a single vocabulary document in existence, for all time, at one
location. This has also lead to a false requirement that all
vocabularies should be centralized.

Here's the fear:

If a vocabulary document disappears for any reason, then the meaning of
the vocabulary is lost and all triples depending on the lost vocabulary
become useless.

That fear ignores the fact that we have a highly available document
store available to us (the Web). Not only that, but these vocabularies
will be cached (at Google, at Yahoo, at The Wayback Machine, etc.).

IF a vocabulary document disappears, which is highly unlikely for
popular vocabularies - imagine FOAF disappearing overnight, then there
are alternative mechanisms to extract meaning from the triples that will
be left on the web.

Here are just two of the possible solutions to the problem outlined:

- The vocabulary is restored at another URL using a cached copy of the
vocabulary. The site owner of the original vocabulary either re-uses the
vocabulary, or re-directs the vocabulary page to another domain
(somebody that will ensure the vocabulary continues to be provided -
somebody like the W3C).
- RDFa parsers can be given an override list of legacy vocabularies that
will be loaded from disk (from a cached copy). If a cached copy of the
vocabulary cannot be found, it can be re-created from scratch if 
necessary.


The argument that link rot would cause massive damage to the semantic
web is just not true. Even if there is minor damage caused, it is fairly
easy to recover from it, as outlined above.


A few other points:

1. It's for the community of vocabulary-creators to help each other 
out w.r.t. hosting/publishing these: I just nudged a friend to put 
another 5 years on the DNS rental for a popular namespace. I think we 
should put a bit more structure around these kinds of habit, so that 
popular namespaces won't drop off the Web through accident.


2. digitally signing the schemas will become part of the story, I'm 
sure. While it's a bit fiddly, there are advantages to having other 
mechanisms beyond URI de-referencing for knowing where a schema came from


3. Parties worried about external dependencies when using namespaces 
can always indirect through their own namespace, whose schema document 
can declare subclass/subproperty relations to other URIs


cheers

Dan




The most important point to take from all of this, though, is that link 
rot within the RDF world is an extremely rare and unlikely occurrence. 
I've been working with RDF for close to a decade, and link rot has never 
been an issue.


One of the very first uses of RDF, in RSS 1.0, for feeds, is still in 
existence, still viable. You don't have to take my word, check it out 
yourselves:


http://purl.org/rss/1.0/

Even if, and I want to strongly emphasize "if" link rot does occur, both 
Manu and Dan have demonstrated multiple ways of ensuring that no meaning 
is lost, and nothing is broken. However, I hope that people are open 
enough to take away from their discussions that  they are trying to 
treat this concern respectfully, and trying to demonstrate that there's 
more than one solution. Not that this forms a "proof" that "Oh my god, 
if we use RDF, we're doomed!"


Also don't lose sight that this is really no more serious an issue than, 
say, a company originating "com.sun.*" being purchased by another 
company, named "com.oracle.*".  And you can't say, "Well that's not the 
same", because it is.


The only "safe" bet is to designate some central authority and give them 
power over every possible name. Then we run the massive risk of this 
system failing (and this applies to microdata's reverse DNS as well as 
RDF's URI), or it being taken over by an entity that sees such a data 
store as a way to make a great profit. We also defeat the very principle 
on which semantic data on the web abides, and that's true whether you're 
support microdata or RDF.


Shelley






Re: [whatwg] Link rot is not dangerous (was: Re: Annotating structured data that HTML has nosemanticsfor)

2009-05-15 Thread Kristof Zelechovski
I understand that there are ways to recover resources that disappear from
the Web; however, the postulated advantage of RDFa "you can go see what it
means" simply does not hold.  The recovery mechanism, Web search/cache,
would be as good for CURIE URL as for domain prefixes.  Creating a redirect
is not always possible and the built-in redirect dictionary (CURIE catalog?)
smells of a central repository.  This is no better than public entity
identifiers in XML.

Serving the vocabulary from the own domain is not always possible, e.g. in
case of reader-contributed content, and only guarantees that the vocabulary
will be alive while it is supported by the domain owner.  (WHATWG wants HTML
documents to be readable 1000 years from now.)  It is not always practical
either as it could confuse URL-based tools that do not retrieve the
resources referenced.

All this does not imply, of course, that RDFa is no good.  It is only
intended to demonstrate that the postulated advantage of the CURIE lookup is
wishful thinking.

Best regards,
Chris



Re: [whatwg] Link rot is not dangerous

2009-05-15 Thread Dan Brickley

On 15/5/09 18:20, Manu Sporny wrote:

Kristof Zelechovski wrote:

Therefore, link rot is a bigger problem for CURIE
prefixes than for links.


There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that there
must be a single vocabulary document in existence, for all time, at one
location. This has also lead to a false requirement that all
vocabularies should be centralized.

Here's the fear:

If a vocabulary document disappears for any reason, then the meaning of
the vocabulary is lost and all triples depending on the lost vocabulary
become useless.

That fear ignores the fact that we have a highly available document
store available to us (the Web). Not only that, but these vocabularies
will be cached (at Google, at Yahoo, at The Wayback Machine, etc.).

IF a vocabulary document disappears, which is highly unlikely for
popular vocabularies - imagine FOAF disappearing overnight, then there
are alternative mechanisms to extract meaning from the triples that will
be left on the web.

Here are just two of the possible solutions to the problem outlined:

- The vocabulary is restored at another URL using a cached copy of the
vocabulary. The site owner of the original vocabulary either re-uses the
vocabulary, or re-directs the vocabulary page to another domain
(somebody that will ensure the vocabulary continues to be provided -
somebody like the W3C).
- RDFa parsers can be given an override list of legacy vocabularies that
will be loaded from disk (from a cached copy). If a cached copy of the
vocabulary cannot be found, it can be re-created from scratch if necessary.

The argument that link rot would cause massive damage to the semantic
web is just not true. Even if there is minor damage caused, it is fairly
easy to recover from it, as outlined above.


A few other points:

1. It's for the community of vocabulary-creators to help each other out 
w.r.t. hosting/publishing these: I just nudged a friend to put another 5 
years on the DNS rental for a popular namespace. I think we should put a 
bit more structure around these kinds of habit, so that popular 
namespaces won't drop off the Web through accident.


2. digitally signing the schemas will become part of the story, I'm 
sure. While it's a bit fiddly, there are advantages to having other 
mechanisms beyond URI de-referencing for knowing where a schema came from


3. Parties worried about external dependencies when using namespaces can 
always indirect through their own namespace, whose schema document can 
declare subclass/subproperty relations to other URIs


cheers

Dan




[whatwg] Link rot is not dangerous (was: Re: Annotating structured data that HTML has nosemanticsfor)

2009-05-15 Thread Manu Sporny
Kristof Zelechovski wrote:
> Therefore, link rot is a bigger problem for CURIE
> prefixes than for links.

There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that there
must be a single vocabulary document in existence, for all time, at one
location. This has also lead to a false requirement that all
vocabularies should be centralized.

Here's the fear:

If a vocabulary document disappears for any reason, then the meaning of
the vocabulary is lost and all triples depending on the lost vocabulary
become useless.

That fear ignores the fact that we have a highly available document
store available to us (the Web). Not only that, but these vocabularies
will be cached (at Google, at Yahoo, at The Wayback Machine, etc.).

IF a vocabulary document disappears, which is highly unlikely for
popular vocabularies - imagine FOAF disappearing overnight, then there
are alternative mechanisms to extract meaning from the triples that will
be left on the web.

Here are just two of the possible solutions to the problem outlined:

- The vocabulary is restored at another URL using a cached copy of the
vocabulary. The site owner of the original vocabulary either re-uses the
vocabulary, or re-directs the vocabulary page to another domain
(somebody that will ensure the vocabulary continues to be provided -
somebody like the W3C).
- RDFa parsers can be given an override list of legacy vocabularies that
will be loaded from disk (from a cached copy). If a cached copy of the
vocabulary cannot be found, it can be re-created from scratch if necessary.

The argument that link rot would cause massive damage to the semantic
web is just not true. Even if there is minor damage caused, it is fairly
easy to recover from it, as outlined above.

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: A Collaborative Distribution Model for Music
http://blog.digitalbazaar.com/2009/04/04/collaborative-music-model/



Re: [whatwg] Annotating structured data that HTML has nosemanticsfor

2009-05-15 Thread Kristof Zelechovski
Links do not contribute to the behavior the meaning of the text contained
within them and not to its meaning. which does not depend on whether the
link is broken or not.  Moreover, whether the linked resource can be
retrieved at all depends on the URI scheme, as in href="mailto:u...@domain";.
The advertised advantage of CURIE prefixes is that the metadata declaration
can be retrieved and looked up, and that can influence the meaning of the
text thus marked.  Therefore, link rot is a bigger problem for CURIE
prefixes than for links.

I think the original URL corresponding to a reversed domain prefix is
irrelevant, and attempts to reconstruct it are futile anyway.  Nonexistent
features are better than features that decay progressively, at least as far
as a specification is concerned.

Best regards,
Chris



Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-15 Thread Simon Pieters
On Thu, 14 May 2009 22:30:41 +0200, Shelley Powers  
wrote:

>> I'm not 100% sure microdata can really achieve this, but I think making  
>> the attempt is a positive step.
>>
> It can't, don't you see?
>
> Microdata will only work in HTML5/XHTML5.

Actually, as specified, it would work for any text/html and any XHTML content. 
It would just be valid in (X)HTML5, but it would work even if the input is not 
valid (X)HTML5 or looks like HTML4 or XHTML 1.1.


> XHTML 1.1 and yes, 2.0 will be  
> around for years, decades. In addition, XHTML5 already supports RDFa.

XHTML5 supports RDFa to the same extent that XHTML 1.1 supports microdata (in 
both cases, it would work but is not valid).

-- 
Simon Pieters
Opera Software


Re: [whatwg] Annotating structured data that HTML has no semanticsfor

2009-05-15 Thread Eduard Pascual
On Fri, May 15, 2009 at 1:44 PM, Kristof Zelechovski
 wrote:
> I do not think anybody in WHATWG hates the CURIE tool; however, the
> following problems have been put forward:
>
> Copy-Paste
>        The CURIE mechanism is considered inconvenient because is not
> copy-paste-resilient, and the associated risk is that semantic elements
> would randomly change their meaning.
Copy-paste issues with RDFa and similar syntaxes can take two forms:
The first is horfaned prefixes: when metadata with a given prefix is
copied, but then it's pasted in a context where the prefix is not
defined. If the user that is copy-pasting this stuff really cares
about metadata, s/he would review the code and make the relevant fixes
and/or copy the prefix declarations; the same way when an author is
copy-pasting content and wants to preserve formatting s/he would copy
the CSS stuff. If the user doesn't actually care about the metadata,
then there is no harm, because properties relying on an unmapped
prefix should yield no RDF output at all.
The second form is prefix clashes: this is actually extremely rare.
For example, someone copies code with FOAF metadata, and then pastes
it on another page: which are the chances that user will be using a
foaf: prefix for something else than FOAF? Sure, there are cases where
a clash might happen but, again, these are only likely to appear on
pages by authors who have some idea about metadata, and hence the
author is more than likely to review the code being pasted to prevent
these and other clashes (such as classes that would mean something
completelly different under the new page's CSS code, element id
clashes, etc). A last possibility is that the author doesn't have any
idea about metadata at all, but is using a CMS that relies on
metadata. In such case, it would be wise on the CMS's part to
pre-process code fragments and either map the prefix to what they mean
(if it's obvious) or remove the invalid data (if the CMS can't figure
out what it should mean).

>
> Link rot
>        CURIE definitions can only be looked up while the CURIE server is
> providing them; the chance of the URL becoming broken is high for
> home-brewed vocabularies.  While the vocabularies can be moved elsewhere, it
> will not always be possible to create a redirect.

Oh, and do reversed domains help at all with this? Ok, with CURIEs
there is a (relatively small) chance for the CURIE to not be
resolvable at a given time; reversed domains have a 100% chance to not
be resolvable at any time: there is always, at least, ambiguity: does
org.example.foo map to foo.example.org, example.org/foo, or
example.org#foo? Even better: what if, under example.org we find a
vocabulary at example.org/foo and another at foo.example.org? (Ok,
that'd be quite unwise, although it might be a legitimate way to keep
"deployed" and "test" versions of a vocabulary online at a time; but
anyway CURIEs can cope with it, while reversed domains can't).
Wherever there are links, there is a chance for broken links: that's
part of the nature of links, and the evolving nature of the web. But,
just because the chance of links being broken, would you deny the
utility of elements such as  and ? Reversed domains don't
face broken links because they are simply uncapable to link to
anything.


Now, please, I'd appreciate if you reviewed your "arguments" before
posting them: while the copy-paste issue is a legitimate argument, and
now we can consider whether this copy-paste-resilience is worth the
costs of microdata, that link rot argument is just a waste of
everyone's time, including yours. Anyway, thanks for that first
argument: that's exactly what I was asking for in the hope of letting
this discussion advance somewhere.

So, before we start comparing benefits against costs, can someone post
anymore benefits or does the "copy-paste-resilience" point stand alone
against all the costs and possible issues?

Regards,
Eduard Pascual


Re: [whatwg] Annotating structured data that HTML has no semanticsfor

2009-05-15 Thread Kristof Zelechovski
Scripts are not copy-paste resilient but scripts are advanced content; they
are expected to be created, maintained and understood by experienced Web
developers.  It is provably impossible to tell what a particular script is
doing, except by means of a thorough and exhaustive research that is not
guaranteed to succeed.  And execution of scripts is disabled in many
environments.

CSS declarations are used to convey presentation; breaking CSS should not
affect the ability to understand the content.  WHATWG considers CSS issues
out of scope, except for admitting the applicability of CSS.

OTOH, losing semantics because of prefix reassignment in RDFa can cause
serious harm.

If you regard CURIE prefixes as fixed, you need a central registry to avoid
clashes.  Avoiding a central registry is an explicit requirement AFAIK.

There is nothing wrong with pasting just a part of another page; in fact,
this is often intended.  The cases of erroneous clipping are obvious; the
cases of inconsistent prefix declarations are not.

Correct me if I am wrong, but I think it is a requirement that HTML content
is required to be copy-paste-resilient wherever that is feasible.  Few Web
page publishers use paste, agreed; however, they often encourage the readers
to paste content nowadays, and the readers do it a lot.

Link rot has been considered an issue with RDFa; my post, if nothing else,
can be viewed as a consideration.  While I am not an RDFa VIP, my
consideration does not get any less real because of that.

Best regards,
Chris




Re: [whatwg] Annotating structured data that HTML has no semanticsfor

2009-05-15 Thread Dan Brickley

On 15/5/09 14:11, Shelley Powers wrote:

Kristof Zelechovski wrote:

I do not think anybody in WHATWG hates the CURIE tool; however, the
following problems have been put forward:

Copy-Paste
The CURIE mechanism is considered inconvenient because is not
copy-paste-resilient, and the associated risk is that semantic elements
would randomly change their meaning.


Well, no, the elements won't randomly change their meaning. The only
risk is copying and pasting them into a document that doesn't provide
namespace definitions for the prefixes. Are you thinking that someone
will be using different namespaces but the same prefix? Come on -- do
you really think that will happen?


The most likely case is with Dublin Core, but DC data varies enough 
already that this isn't too destructive...


Dan


Re: [whatwg] Annotating structured data that HTML has no semanticsfor

2009-05-15 Thread Shelley Powers

Kristof Zelechovski wrote:

I do not think anybody in WHATWG hates the CURIE tool; however, the
following problems have been put forward:

Copy-Paste
The CURIE mechanism is considered inconvenient because is not
copy-paste-resilient, and the associated risk is that semantic elements
would randomly change their meaning.
  


Well, no, the elements won't randomly change their meaning. The only 
risk is copying and pasting them into a document that doesn't provide 
namespace definitions for the prefixes. Are you thinking that someone 
will be using different namespaces but the same prefix? Come on -- do 
you really think that will happen?


How big a risk is this? I would actually say it's minor. Probably no 
more of a risk than happened with people making copies of other web page 
content, and cutting off the end, or forgetting to change all the values 
once copied.


People can copy and paste JavaScript that references elements with 
certain identifiers. If those aren't used correctly, the application 
will also fail. Therefore we should not allow copying and pasting of 
script? How about CSS, then. Can't copy and paste CSS, because again 
this action is dependent on another and equal action either in a 
separate document, or elsewhere in the page.


There is no such thing as risk free copy and paste. And frankly, few 
people will be doing copying and pasting. Most metadata will probably be 
added either as part of an underlying tool, like Drupal, or using 
modules and plug-ins that come with documentation, or insert what's 
needed dynamically.


This isn't HTML 3.0 times any more.


Link rot
CURIE definitions can only be looked up while the CURIE server is
providing them; the chance of the URL becoming broken is high for
home-brewed vocabularies.  While the vocabularies can be moved elsewhere, it
will not always be possible to create a redirect.

Chris




  


Well, now, have you tried to look up one of the reversed DNS values yet?

I don't believe that link rot was ever really considered an issue with 
RDFa.


Shelley


Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-15 Thread Shelley Powers

Maciej Stachowiak wrote:


On May 14, 2009, at 1:30 PM, Shelley Powers wrote:

So, if I'm pushing for RDFa, it's not because I want to "win". It's 
because I have things I want to do now, and I would like to make sure 
have a reasonable chance of working a couple of years in the future. 
And yeah, once SVG is in HTML5, and RDFa can work with HTML5, maybe I 
wouldn't mind giving old HTML a try again. Lord knows I'd like to 
user ampersands again.


It sounds like your argument comes down to this: you have personally 
invested in RDFa, therefore having a competing technology is bad, 
regardless of the technical merits. I don't mean to parody here - I am 
somewhat sympathetic to this line of argument. Often pragmatic 
concerns mean that an incremental improvement just isn't worth the 
cost of switching (for example HTML vs. XHTML). My personally judgment 
is that we're not past the point of no return on data embedding. 
There's microformats, RDFa, and then dozens of other serializations of 
RDF (some of which you cited). This doesn't seem like a space on the 
verge of picking a single winner, and the players seem willing to 
experiment with different options.



There are not dozens of other serializations of RDF.

The point I was trying to make is, I'd rather put my time into something 
that exists now, than have to watch the wheel re-invented. I'd rather 
see semantic metadata become a reality. I'm glad that you personally 
feel that companies will be just peachy keen on having to support 
multiple parsers to get the same data.


On the HTML WG side, I will never support microdata, because no case has 
been made for its existence.




The point is, people in the real world have to use this stuff. It 
helps them if they have one, generally agreed on approach. As it 
is, folks have to contend with both RDFa and microformats, but at 
least we know these have different purposes.


From my cursory study, I think microdata could subsume many of the 
use cases of both microformats and RDFa. It seems to me that it 
avoids much of what microformats advocates find objectionable, and 
provides a good basis for new microformats; but at the same time it 
seems it can represent a full RDF data model. Thus, I think we have 
the potential to get one solution that works for everyone.


I'm not 100% sure microdata can really achieve this, but I think 
making the attempt is a positive step.



It can't, don't you see?

Microdata will only work in HTML5/XHTML5. XHTML 1.1 and yes, 2.0 will 
be around for years, decades. In addition, XHTML5 already supports RDFa.


Supporting XHTML 1.1 has about 0.001% as much value as 
supporting  text/html. XHTML 2.0 is completely irrelevant to the Web, 
and looks on track to remain so. So I don't find this point very 
persuasive.


I don't think you'll find that the world is breathlessly waiting for 
HTML5. I think you'll find that XHTML 1.1 will have wider use than HTML5 
for the next decade. If not longer. I wouldn't count out XHTML 2.0, 
either.  And in a decade, a lot can change.


Why you think something completely brand new, no vendor support, 
drummed up in a few hours or a day or so is more robust, and a better 
option than a mature spec in wide use, well frankly boggles my mind.


I haven't evaluated it enough to know for sure (as I said). I do think 
avoiding CURIEs is extremely valuable from the point of view of sane 
text/html semantics and ease of authoring; and RDF experts seem to 
think it works fine for representing RDF data models. So tentatively, 
I don't see any gaping holes. If you see a technical problem, and not 
just potential competition for the technology you've invested in, then 
you should definitely cite it.


I don't think CURIEs are that difficult, nor impossible no matter the 
arguments that Henri brings out.


I am impressed with your belief in HTML5.

But
One other detail that it seems not many people have picked up on yet 
is that microdata proposes a DOM API to extract microdata-based info 
from a live document on the client side. In my opinion this is huge 
and has the potential to greatly increase author interest in 
semantic markup.




Not really. Can do this now with RDFa in XHTML. And I don't need any 
new DOM to do it.


The power of semantic markup isn't really seen until you take that 
markup data _outside_ the document. And merge that data with data 
from other documents. Google rich snippets. Yahoo searchmonkey. Heck, 
even an application that manages the data from different subsites of 
one domain.


I respectfully disagree. An API to do things client-side that doesn't 
require an external library is extremely powerful, because it lets 
content authors easily make use of the very same semantic markup that 
they are vending for third parties, so they have more incentive to use 
it and get it right.



Sure, we'll have to disagree on this one.


Now, it may be that microdata will ultimately fail, either because 
it is outcompeted by RDFa, 

Re: [whatwg] Annotating structured data that HTML has no semanticsfor

2009-05-15 Thread Kristof Zelechovski
I do not think anybody in WHATWG hates the CURIE tool; however, the
following problems have been put forward:

Copy-Paste
The CURIE mechanism is considered inconvenient because is not
copy-paste-resilient, and the associated risk is that semantic elements
would randomly change their meaning.

Link rot
CURIE definitions can only be looked up while the CURIE server is
providing them; the chance of the URL becoming broken is high for
home-brewed vocabularies.  While the vocabularies can be moved elsewhere, it
will not always be possible to create a redirect.

Chris





Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-15 Thread Eduard Pascual
On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak  wrote:
> [...]
> From my cursory study, I think microdata could subsume many of the use cases
> of both microformats and RDFa.
Maybe. But microformats and RDFa can handle *all* of these cases.
Again, which are the benefits of creating something entirely new to
replace what already exists while it can't even handle all the cases
of what it is replacing? Both the new syntax, and the cases
restrictions, are costs: what are these costs buying? If it's not
clear what we are getting for these costs, it is impossible to
evaluate whether the costs are worth it or not.

> It seems to me that it avoids much of what microformats advocates find 
> objectionable
Could you specify, please? Do you mean anything else than WHATWG's
almost irrational hate toward CURIEs and everything that involves
prefixes?

> but at the same time it seems it can represent a full RDF data
> model.
No, it *can't* represent a full RDF model: it has already been shown
several times on this thread.

> Thus, I think we have the potential to get one solution that works for 
> everyone.
RDFa itself doesn't work for everyone; but microdata is even more
restricted: it leaves out the cases that RDFa leaves out, but it also
leaves out some cases that RDFa was able to handle. So, where do you
see such potential?

> I'm not 100% sure microdata can really achieve this, but I think making the
> attempt is a positive step.
What do you mean by "making the attempt"? If there is something
microdata can't handle, it won't be able to handle it without changing
the spec. If you meant that evolving that microdata proposal towards
something that works for everyone is a positive step, then I agree;
but if you meant that engraving this microdata approach into the spec
and set it into stone, then attempt for everyone to accept it, then I
definitelly disagree. So, please, could you clarify the meaning of
that statement? Thanks.

> One other detail that it seems not many people have picked up on yet is that
> microdata proposes a DOM API to extract microdata-based info from a live
> document on the client side. In my opinion this is huge and has the
> potential to greatly increase author interest in semantic markup.
Allright, an API may be a benefit. Most probably it is. However, a
similar API could have been built from RDFa, or eRDF, or EASE, or any
other already existing or new solution; so it doesn't justify creating
a new syntax. I have to insist: which are the benefits from such
built-from-the-ground, restrictive *syntax*? That's what we need to
know to evaluate it against its costs.

> Now, it may be that microdata will ultimately fail, either because it is
> outcompeted by RDFa, or because not enough people care about semantic
> markup, or whatever. But at least for now, I don't see a reason to strangle
> it in the cradle.
At least for now, I don't see a reason why it was created to begin
with. Maybe if somebody could enlighten us with this detail, this
discussion could evolve into something more useful and productive.

On Fri, May 15, 2009 at 6:53 AM, Maciej Stachowiak  wrote:
>
> On May 14, 2009, at 1:30 PM, Shelley Powers wrote:
>
>> So, if I'm pushing for RDFa, it's not because I want to "win". It's
>> because I have things I want to do now, and I would like to make sure have a
>> reasonable chance of working a couple of years in the future. And yeah, once
>> SVG is in HTML5, and RDFa can work with HTML5, maybe I wouldn't mind giving
>> old HTML a try again. Lord knows I'd like to user ampersands again.
>
> It sounds like your argument comes down to this: you have personally
> invested in RDFa, therefore having a competing technology is bad, regardless
> of the technical merits.
Pause, please. Before going on, I need to ask again: which are those
technical merits??

> I don't mean to parody here - I am somewhat sympathetic to this line of 
> argument.
I think I'm interpreting Shelley's argument slightly differently. She
didn't chose RDFa because it was better than microdata. She chose RDFa
because it was better than other options, and microdata didn't even
exist yet. Now microdata comes out, some drawbacks are highlighted in
comparison with RDFa (lack of typing, inability to depict the full RDF
model, Reversed domains are as ugly as CURIEs (but at least CURIEs
resolve to something useful, while reversed domains often don't
resolve at all), and you ask RDFa proponents to give microdata a
chance, to not "strangle it in the cradle"; but nobody seems willing
to answer the one question: what does microdata provide to make up for
its drawbacks?

> Often pragmatic concerns mean that an incremental improvement just isn't 
> worth the cost of switching
Wait. Are you refering to microdata as an incremental improvement over
RDFa?? IMO, it's rather a decremental enworsement.

> My personally judgment is that we're not past the point of
> no return on data embedding. There's microformats, RDFa, and then dozens of
> oth

Re: [whatwg] [Fwd: Re: Helping people seaching for content filtered by license]

2009-05-15 Thread Smylers
Eduard Pascual writes:

> On Fri, May 15, 2009 at 8:40 AM, Smylers  wrote:
> 
> > > Am Freitag, den 08.05.2009, 19:57 + schrieb Ian Hickson:
> > >
> > > >  * Tara runs a video sharing web site for people who want
> > > >licensing information to be included with their videos.
> > > >When Paul wants to blog about a video, he can paste a
> > > >fragment of HTML provided by Tara directly into his blog.
> > > >The video is then available inline in his blog, along
> > > >with any licensing information about the video.
> > 
> > Why does the license information need to be machine-readable in this
> > case?  (It may need to be for a different scenario, but that would be
> > dealt with separately.)
> 
> It would need to be machine-readable for tools like
> http://search.creativecommons.org/ to do their job: check the license
> against the engine's built-in knowledge of some licenses, and figure
> out if it is suitable for the usages the user has requested (like
> "search for content I can build upon" or "search for content I can use
> commercialy"). Ideally, a search engine should have enough with
> finding the video on either Tara's site *or* Paul's blog for it to be
> available for users.

Yeah, that sounds plausible.  However that's what I meant by "a
different scenario" -- adding criteria to the above, specifically about
searching.  Hixie attempted to address this case too:

> > > > Admittedly, if this scenario is taken in the context of the
> > > > first scenario, meaning that Bob wants this image to be
> > > > discoverable through search, but doesn't want to include it on a
> > > > page of its own, then extra syntax to mark this particular image
> > > > up would be useful.
> > > >
> > > > However, in my research I found very few such cases. In every
> > > > case where I found multiple media items on a single page with no
> > > > dedicated page, either every item was licensed identically and
> > > > was the main content of the page, or each item had its own
> > > > separate page, or the items were licensed under the same license
> > > > as the page. In all three of these cases, rel=license already
> > > > solves the problem today.

To which Nils responded:

> > > Relying on linked pages just to get licensing information would
> > > be, well, massive overhead. Still, you are right - most blogs
> > > using many pictures have dedicated pages.

It's perfectly valid to disagree with this being sufficient (I
personally have no view either way on the matter).  I was just
clarifying that the legend mark-up example wasn't attempting to address
this case, and wasn't proposing  (or whatever) as a
machine-readable microformat.

Smylers


Re: [whatwg] [Fwd: Re: Helping people seaching for content filtered by license]

2009-05-15 Thread Eduard Pascual
On Fri, May 15, 2009 at 8:40 AM, Smylers  wrote:
> Nils Dagsson Moskopp writes:
>
>> Am Freitag, den 08.05.2009, 19:57 + schrieb Ian Hickson:
>>
>> >      * Tara runs a video sharing web site for people who want
>> >        licensing information to be included with their videos. When
>> >        Paul wants to blog about a video, he can paste a fragment of
>> >        HTML provided by Tara directly into his blog. The video is
>> >        then available inline in his blog, along with any licensing
>> >        information about the video.
> [...]

> Why does the license information need to be machine-readable in this
> case?  (It may need to be for a different scenario, but that would be
> dealt with separately.)

It would need to be machine-readable for tools like
http://search.creativecommons.org/ to do their job: check the license
against the engine's built-in knowledge of some licenses, and figure
out if it is suitable for the usages the user has requested (like
"search for content I can build upon" or "search for content I can use
commercialy"). Ideally, a search engine should have enough with
finding the video on either Tara's site *or* Paul's blog for it to be
available for users.

Just my two cents.