Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-07-27 Thread Eduard Pascual
I have put a new version of the CRDF document up [1]. Here is a
summary of the most significant changes:

* Location: with the migration from "Google Pages" to "Google Sites",
the PDF document will not be allowed anymore to be hosted at its
former location. I wanted to keep this proposal independent from my
own website; but in the need of a reliable location for the document,
I have made room for it on my site's host.
To avoid having to keep track of two online copies of the document nor
having an outdated version online, I have removed the document from
the old location.
My apologies for any inconvenience this might cause.

* Inline content: now full "sheets" are accepted inside the inline
crdf attribute, whatever it gets called; so something like  should be
doable, mimicking RDFa's in XML-based language ability to declare
namespaces inline with code like http://example.com/foo#";>. In addition, a
pseudo-algorythm has been defined that allows to determine whether the
content of the attribute is a full sheet or just a set of
declarations.

* Inline vs. linked metadata: this brief new section attempts to
explain when each approach is more suitable, and why both need to be
supported by CRDF.

* Conformance requirements: this new section describes what a document
must do to be "conformant", and what would tools have to do to be
"conformant". It should be taken as an informative summary rather than
as a normative definition (especially the part about tools), and is
mostly intended to give a glance of what should be expected from an
hypothetical "CRDF-aware browser".

* Microformats compatibility: after some research and lots of "trial
and error", it has been found that it is not possible to match the
microformat concept of "singular properties" with CSS3 Selectors. The
document now suggest an extension (just a pseudo-class named
":singular") to handle this. This is a very new addition and feedback
on it would be highly valuable.

[1] http://crdf.dragon-tech.org/crdf.pdf

Regards,
Eduard Pascual


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-07-09 Thread Eduard Pascual
On Thu, Jul 9, 2009 at 12:06 AM, Ian Hickson wrote:
> On Wed, 10 Jun 2009, Eduard Pascual wrote:
>> >
>> > I think this is a level of indirection too far -- when something is a
>> > heading, it should _be_ a heading, it shouldn't be labeled opaquely
>> > with a transformation sheet elsewhere defining that is maps to the
>> > heading semantic.
>>
>> That doesn't make much sense. When something is a heading, it *is* a
>> heading. What do you mean by "should be a heading?".
>
> I mean that a conforming implementation should intrinsically know that the
> content is a heading, without having to do further processing to discover
> this.
>
> For example, with this CSS and HTML:
>
>   h1 { color: blue; }
>
>    Introduction 
>
> ...the HTML processor knows, regardless of what else is going on, that the
> word "Introduction" is part of a heading. It only knows that the word
> should be blue after applying processing rules for CSS.
Now I think I got your point. However, I don't think it is really an
issue. Let's take a variant of your example:

CSS:
h1 { font-size: large; }

CRDF:
h1 { foo|MainHeading: contents; }

HTML:
 Introduction 

If we took the HTML alone (for example, if the CSS and CRDF are in
external files and fail to download), the browser will find an H1
element and it will know that it is a first-level heading. It will
also render it large by default (maybe depending of context; a voice
browser won't render anything as "large"). Now, if the CSS and CRDF
get processed, the browser will *also* know that it has to render it
large (now it's not just falling back to some default, it knows that
the author wanted the heading to render as large), and that it is
whatever the "foo" (or the namespace mapped by the "foo" prefix, to be
more specific) namespace defines as a "MainHeading", which will
probably be something quite similar to the browser's own concept of
"first-level heading".

The point here is: the CSS is stating that the  should display
large; despite the browser would display it large in most cases.
Similarly, the CRDF is defining the  as a MainHeading, despite the
browser already knows it is a heading. Both the CSS and the CRDF
provide redundant information. Of course, someone could attempt to
describe semantics through CRDF that conflict with HTML's, but that
one could also make headings smaller, hide s and enlarge
s with CSS.

No matter what CRDF says, a compliant HTML browser will always know
that  is a heading (and similarly, will know what other HTML
elements mean). But if what CRDF says is consisten with what the HTML
says (the main point of metadata is stating things that are true,
false data is almost useless), then RDF tools that are completelly
unaware of HTML itself can still know that something is a heading. The
same way, when CSS is consistent with HTML's semantics (for example
making headings large, s bold, or s italized), a user
viewing the page can perceive that something is a heading, important,
or emphasized, respectivelly.

> I think by and large the same should hold for more elaborate semantics.
>
>
> (I didn't really agree with your other responses regarding my criticisms
> of your proposal either, but I don't have anything except my opinions to
> go on as far as those go, so I can't argue my case usefully there.)
Most of such responses were based on what is brewing for the next
version of the document, rather than the version actually available,
so I don't think it's worth going further on those points until the
update is ready and up.

>> > I think CRDF has a bright future in doing the kind of thing GRDDL does,
>>
>> I'm not sure about what GRDDL does: I just took a look through the spec,
>> and it seems to me that it's just an overcomplication of what XSLT can
>> already do; so I'm not sure if I should take that statement as a good or
>> a bad thing.
>
> A good thing.
>
> GRDDL is a way to take an HTML page and infer RDF information from that
> page despite the page, e.g. by "implementing" Microformats using XSLT. So
> for example, GRDDL can be used to extract hCard data from an HTML page and
> turn it into RDF data.
Ok. Making metadata available from documents that were not authored
with metadata in mind, and without altering the document itself (at
much adding a  to the header) is one of the use-cases CRDF aims
to handle; so it's good news to hear from someone that it's on the
right way to achieve it ^^-

>> > It's an interesting way of converting, say, Microformats to RDF.
>>
>> The ability to convert Microformats to RDF was intended (although not
>> fully achieved: some "bad" content would be treated differently between
>> CRDF and Microformats); and in the same way CRDF also provides the
>> ability to define de-centralized Microformats.org-like vocabularies (I'm
>> not sure if referring to these as "microformats" would still be
>> appropiate).
>
> I think this is a particularly useful feature; I would encourage you to
> continue to develop this idea as a separate languag

Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-07-08 Thread Ian Hickson
On Wed, 10 Jun 2009, Eduard Pascual wrote:
> >
> > I think this is a level of indirection too far -- when something is a 
> > heading, it should _be_ a heading, it shouldn't be labeled opaquely 
> > with a transformation sheet elsewhere defining that is maps to the 
> > heading semantic.
>
> That doesn't make much sense. When something is a heading, it *is* a 
> heading. What do you mean by "should be a heading?".

I mean that a conforming implementation should intrinsically know that the 
content is a heading, without having to do further processing to discover 
this.

For example, with this CSS and HTML:

   h1 { color: blue; }

Introduction 

...the HTML processor knows, regardless of what else is going on, that the 
word "Introduction" is part of a heading. It only knows that the word 
should be blue after applying processing rules for CSS.

I think by and large the same should hold for more elaborate semantics.


(I didn't really agree with your other responses regarding my criticisms 
of your proposal either, but I don't have anything except my opinions to 
go on as far as those go, so I can't argue my case usefully there.)


> > I think CRDF has a bright future in doing the kind of thing GRDDL does,
>
> I'm not sure about what GRDDL does: I just took a look through the spec, 
> and it seems to me that it's just an overcomplication of what XSLT can 
> already do; so I'm not sure if I should take that statement as a good or 
> a bad thing.

A good thing.

GRDDL is a way to take an HTML page and infer RDF information from that 
page despite the page, e.g. by "implementing" Microformats using XSLT. So 
for example, GRDDL can be used to extract hCard data from an HTML page and 
turn it into RDF data.


> > It's an interesting way of converting, say, Microformats to RDF.
>
> The ability to convert Microformats to RDF was intended (although not 
> fully achieved: some "bad" content would be treated differently between 
> CRDF and Microformats); and in the same way CRDF also provides the 
> ability to define de-centralized Microformats.org-like vocabularies (I'm 
> not sure if referring to these as "microformats" would still be 
> appropiate).

I think this is a particularly useful feature; I would encourage you to 
continue to develop this idea as a separate language, and see if there is 
a market for it.


Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-06-10 Thread Eduard Pascual
First of all, Ian, thank for your reply. I appreciate any opinions on
this subject.

On Wed, Jun 10, 2009 at 1:29 AM, Ian Hickson wrote:
> This proposal is very similar to RDF EASE.
Indeed, they are both CSS-based, and they fulfill similar purposes.
Let me, however, highlight some differences:
1st, EASE is tighly bound to RDFa. However, RDFa is meant for
embeeding metadata, and was built with that purpose on mind; while
EASE is meant for linked metadata, so builiding it on top of RDFa's
embeeding constructs is quite unnatural. In contrast, CRDF is build
from CSS's syntax and RDF's (not RDFa's) concepts: it only shares with
RDFa what they both inherit from RDF: the concepts and data model.
2nd, EASE is meant to be complimentary to RDFa: they address (or
attempt to address) different use cases / needs (embeeding vs.
linking). On the other hand (more on this below), CRDF attempts to
address both cases, plus the case where an hybrid approach is
appropriate (inlining some metadata, and linking other).

> While I sympathise with the
> goal of making semantic extraction easier, I feel this approach has
> several fundamental problems which make it inappropriate for the specific
> use cases that were brought up and which resulted in the microdata
> proposal:
>
>  * It separates (by design) the semantics from the data with those
>   semantics.
That's not accurate. CRDF *allows* separating the semantics, but
doesn't require to do so. Everything could be inlined, and the
possibility of separation is just for when it is needed.

>   I think this is a level of indirection too far -- when
>   something is a heading, it should _be_ a heading, it shouldn't be
>   labeled opaquely with a transformation sheet elsewhere defining that is
>   maps to the heading semantic.
That doesn't make much sense. When something is a heading, it *is* a
heading. What do you mean by "should be a heading?". CRDF (as well as
many other syntaxes for RDF) allow parsers that don't know the
specific semantics of the markup language to find out that something
is actually a heading anyway; and allows expressing semantics that the
markup language has no direct support for (for example, is it a
site-section heading? a news heading? an iguana's name (used as the
main title for each iguana's page on the iguana collection example)?
something else?).

>  * It is even more brittle in the face of copy-and-paste and regular
>   maintenance than, say, namespace prefixes. It is very easy to forget to
>   copy the semantic transformation rules. It is very easy to edit the
>   document such that the selectors no longer match what they used to
>   match. It's not at all obvious from looking at the page that there are
>   semantics there.
I think the whole copy-paste thing should be broken on two separate scenarios:
Copy-pasting source code: with the next version of the document (which
I'm already cleaning up, and will allow "@namespace" rules inside the
inlining attribute), this will be as brittle (and as resillient) as
prefixes are: when a fragment that includes the "@namespace"s or
prefixes it needs is copy-pasted, it will work as expected; OTOH, if a
rule relies on a namespace that is not available (declared outside of
the copy-pasted fragment), the rule will just be ignored. The risk of
the copied code clashing with declarations on its new location is
lower than it may seem: an author who is already adding CRDF code to
his pages is quite likely to review the code he's copying for the
semantics that may be there; and authoring tools that automatically
add semantic code should review whether things make sense or not when
pasting code on them (for example, invalid/redundant properties
could/should be notified to the author).
Copy-pasting content: currently, browser support for copy-pasting CSS
styled content is mediocre and inconsistent (some browsers do it
right, some don't, some don't even try), but this is already more than
what is supported for RDFa, Microdata, or other semantic formats. With
a bit of luck, pressure for browsers to include CRDF properties when
copying content could help to get decent support for CSS properties as
well (since most of the code for these tasks would be shared).

>  * It relies on selectors to do something subtle. Authors have a great
>   deal of trouble understanding selectors -- if you watch a typical Web
>   authors writing CSS, he will either use just class selectors, or he
>   will write selectors by trial and error until he gets the style he
>   wants. This isn't fatal for CSS because you can see the results right
>   there; for something as subtle as semantic data mining, it is extremely
>   likely that authors will make mistakes that turn their data into
>   garbage, which would make the feature impractical for large-scale use.
It relies on selectors to do what they do: select things. Nobody is
*asking* authors to make use of over-complicated selectors for each
piece of metadata they want to add; but CRDF tries to *allow*

Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-06-09 Thread Ian Hickson
On Thu, 14 May 2009, Eduard Pascual wrote:
>
> I have put online a document that describes my idea/proposal for a 
> selector-based solution to metadata. The document can be found at 
> http://herenvardo.googlepages.com/CRDF.pdf Feel free to copy and/or link 
> the file wherever you deem appropriate.
> 
> Needless to say, feedback and constructive criticism to the proposal is 
> always welcome. (Note: if discussion about this proposal should take 
> place somewhere else, please let me know.)

This proposal is very similar to RDF EASE. While I sympathise with the 
goal of making semantic extraction easier, I feel this approach has 
several fundamental problems which make it inappropriate for the specific 
use cases that were brought up and which resulted in the microdata 
proposal:

 * It separates (by design) the semantics from the data with those 
   semantics. I think this is a level of indirection too far -- when 
   something is a heading, it should _be_ a heading, it shouldn't be 
   labeled opaquely with a transformation sheet elsewhere defining that is 
   maps to the heading semantic.

 * It is even more brittle in the face of copy-and-paste and regular 
   maintenance than, say, namespace prefixes. It is very easy to forget to 
   copy the semantic transformation rules. It is very easy to edit the 
   document such that the selectors no longer match what they used to 
   match. It's not at all obvious from looking at the page that there are 
   semantics there.

 * It relies on selectors to do something subtle. Authors have a great 
   deal of trouble understanding selectors -- if you watch a typical Web 
   authors writing CSS, he will either use just class selectors, or he 
   will write selectors by trial and error until he gets the style he 
   wants. This isn't fatal for CSS because you can see the results right 
   there; for something as subtle as semantic data mining, it is extremely 
   likely that authors will make mistakes that turn their data into 
   garbage, which would make the feature impractical for large-scale use.

I say this despite really wanting Selectors to succeed (disclosure: I'm 
one of the editors of the Selectors specification and spent years working 
on its test suite).

I think CRDF has a bright future in doing the kind of thing GRDDL does, 
and in extracting data from pages that were written by authors who did not 
want to provide semantic data (i.e. screen scraping). It's an interesting 
way of converting, say, Microformats to RDF.


Having said that, I do agree that the repetition of microdata requires in 
common scenarios with blocks of repeated data is unfortunate. It is worse 
than the repetition one has just from the basic HTML markup.

e.g. this:

   

  Hedral   Black

  Pillar   White
   

...becomes this:

   

  Hedral   Black

  Pillar   White
   

...or even:

   

  Hedral   
Black

  Pillar   
White
   

...which is far more verbose than ideal.

I considered special casing tables (using  to set 
itemprop="" for all cells in a column) but it would require quite a lot of 
complexity in processors since they'd additionally have to implement the 
table model, and having seen the quality of some of the implementations of 
metadata extractors used on Web content, I fear that that will be far too 
much complexity. (I fear even subject="" might already be too much.) The 
simpler we make it the more reliable it will be.

It also wouldn't solve the problem with other patterns, e.g.  (which 
approaches like CRDF's handle fine).


I don't have a good answer for the repetition problem.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-23 Thread Tab Atkins Jr.
On Fri, May 22, 2009 at 5:26 AM, Eduard Pascual  wrote:
> On Thu, May 21, 2009 at 5:19 PM, Toby Inkster  wrote:
>> [... some stuff about how will English change in a thousand years ...]
>>
>> A great help in clarifying your usage of terms is the inclusion of a
>> glossary. For example, I could write:
>>
>> 
>>  name
>>  
>>    A name is a label for a noun, (human or animal,
>>    thing, place, product [as in a brand name] and even an
>>    idea or concept), normally used to distinguish one from
>>    another.
>>    (http://en.wikipedia.org/wiki/Name";>source)
>>  
>> 
>>
>> With RDFa, the idea of a glossary can be used to reduce our reliance on
>> external vocabularies:
>>
>> http://xmlns.com/foaf/0.1/";
>>    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#";>
>>  name
>>  
>>    A name is a label for a noun, (human or animal,
>>    thing, place, product [as in a brand name] and even an
>>    idea or concept), normally used to distinguish one from
>>    another.
>>    (>    href="http://en.wikipedia.org/wiki/Name";>source)
>>  
>> 
>>
>> This doesn't completely eliminate the risk, but goes a long way to
>> mitigating it.
> Agreed. But CRDF would also allow that kind of glossary. What's your
> point with it?

To be more specific, this *sounds* like you're just generally
advocating for a referencable external vocabulary.  CRDF serializes
out to normal RDF without any magic, same as RDFa, and it uses
prefixes in essentially the same manner (though in a way that I
believe is slightly more compatible with the concerns raised by
Anne/Henri/others).

This is perhaps an argument against Microdata, but not CRDF.

> Again, let me insist that external file CRDF is only one of its
> possible usages. Actually, it only makes sense when it holds rules
> that apply to multiple documents (otherwise, 

Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-22 Thread Henri Sivonen

On May 22, 2009, at 17:44, Toby Inkster wrote:


But given that the
HTML5 spec defines how the DOM is built, there's a very simple  
solution

to that -- HTML5 could simply mandate that:

http://foo.example.com/";>

generates an identical DOM representation in both XHTML5 and HTML5.
What's the problem with that?



 1) It's a difference from how browsers behave now. It's a flaw in  
RDFa that it opens the question whether text/html parsing needs to  
change.
 2) Finding out whether the change to parsing is harmless for  
existing content requires shipping a mass-market browser with the  
parsing change.
 3) It would require the HTML parser to look inside the attribute  
name buffer instead of treating it as an opaque string, which would  
add code complexity.
 4) If if you changed this in text/html parsing, next CURIEs would be  
Selector-unfriendly... CURIEs aren't a good match for the platform.



But for the
most part, those differences are pretty small and obscure, and don't
actually effect real world code very much. e.g. the following code  
seems

to work fine in Opera, Firefox and Midori (a Webkit browser):

http://buzzword.org.uk/2009/dom.html
http://buzzword.org.uk/2009/dom.xhtml


You are using a Namespace-unaware API.

The internal APIs of Gecko and WebKit as well as various non-browser  
XML frameworks are Namespace-aware.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/




Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-22 Thread Toby Inkster
On Fri, 2009-05-22 at 12:26 +0200, Eduard Pascual wrote:
> Are you calling the DOM Consistency Principle a "theoretical" or
> "aesthetic" argument?

Certainly not -- DOM consistency is a great idea. But given that the
HTML5 spec defines how the DOM is built, there's a very simple solution
to that -- HTML5 could simply mandate that:

http://foo.example.com/";>

generates an identical DOM representation in both XHTML5 and HTML5.
What's the problem with that? 

In existing implementations, there are differences, sure. But for the
most part, those differences are pretty small and obscure, and don't
actually effect real world code very much. e.g. the following code seems
to work fine in Opera, Firefox and Midori (a Webkit browser):

http://buzzword.org.uk/2009/dom.html
http://buzzword.org.uk/2009/dom.xhtml

The files are byte-for-byte identical (indeed, on disk, one is just a
symlink to the other).

-- 
Toby Inkster 


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-22 Thread Eduard Pascual
On Thu, May 21, 2009 at 5:19 PM, Toby Inkster  wrote:
> On Thu, 2009-05-21 at 13:26 +0200, Eduard Pascual wrote:
> [... lots ...]
I won't go point by point through your reply neither, but there are
some points worth answering.

> CSS was invented as a way to separate out content from styling. Or to
> put it another way, to separate out data and presentation, which allows
> the same data to be re-presented (or indeed represented) in many
> different ways. The unobtrusive scripting "movement" (for want of a
> better word) aims to separate out behaviour from data, which I think is
> also a worthy ideal. But I consider the information which RDFa carries
> to be very strongly part of the document's *data*, so not especially
> suitable for separating out.
The way you describe CSS really makes it look too different from CRDF
and similar approaches. But I see it somewhat different: as much as
CSS describes how content should be conveyed to humans, CRDF describes
how should it be conveyed to machines. With this description, they
suddenly look quite parallel; so I'll stay in neutral ground and take
these as just different points of view.
It's important to state that CRDF is *not* intended to take *all* the
semantics *out* of the document. In the most extreme cases, it would
be intended to take *some* *descriptions* of those semantics somewhere
more centralized (a external file if it's to be shared by several
documents, the document header if it's to be widely used across the
document, etc).

> (This consideration very much effected the design of RDF-EASE. You'll
> note that the -rdf-about and -rdf-content properties which it defines do
> not allow the author to hard code data into the RDF-EASE file -- they
> only allow the author to specify an attribute from the (X)HTML file
> where the data can be found.)
This makes a lot of sense. Actually, RDF-EASE is meant to be always
placed on an external file, so it's reasonable to disallow stuff that
just shouldn't go on an external file.
CRDF, on the other hand, is designed to work either as an external
file, an embedded piece of code (a.k.a. a 

Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-21 Thread Toby Inkster
On Thu, 2009-05-21 at 13:26 +0200, Eduard Pascual wrote:

> [... lots ...]

Eduard, thanks for your long and informative reply. I won't go into
every point mentioned in detail, but in summary I'd like to say that
your message reassured me on a few points and perhaps CRDF is not as bad
as I initially thought.

That said, I do think that externalising of the semantics of a document
is a mistake. As the author of RDF-EASE, I don't say this without having
thought the matter through. 

CSS was invented as a way to separate out content from styling. Or to
put it another way, to separate out data and presentation, which allows
the same data to be re-presented (or indeed represented) in many
different ways. The unobtrusive scripting "movement" (for want of a
better word) aims to separate out behaviour from data, which I think is
also a worthy ideal. But I consider the information which RDFa carries
to be very strongly part of the document's *data*, so not especially
suitable for separating out.

(This consideration very much effected the design of RDF-EASE. You'll
note that the -rdf-about and -rdf-content properties which it defines do
not allow the author to hard code data into the RDF-EASE file -- they
only allow the author to specify an attribute from the (X)HTML file
where the data can be found.)

That's very much an ideological argument, and I appreciate that not
everyone shares my ideology. But for those who don't, there is also the
more practical argument that separating out an aspect of the document's
meaning from the bulk of the markup increases the fragility of its
meaning. If the external file is lost, then part of the document's
meaning is lost.

Some people might argue that RDF already does this by relying on
external vocabularies, but this is only partly so.

By simply using http://xmlns.com/foaf/0.1/"; property="foaf:name">...
then I am, to a certain extent relying on the FOAF project's definition
of "name" to be stable.

(Bear with me here, as this is about to start to seem very abstract,
but I'll bring it back to the more practical eventually.)

Even without RDFa though, I am relying on the usual English definition
of "name" being stable. It might seem unlikely that the standard English
definition of words is going to change especially much, but remember
that some of HTML5's proponents have lofty ambitions that HTML5
documents should still be readable in 1000 years. 

Think not of 1000 years, but consider how, just in our own lifetimes,
the words 'Web', 'surf' and 'browser' have picked up new meanings which
probably surpass their original meanings in terms of day-to-day usage.

Look back at how English was spoken 1000 years ago and you'll appreciate
how much it's changed. Many people have difficulty reading Shakespeare,
who wrote his work a mere ~400 years ago. Chaucer's "The Canterbury
Tales" which was written only 200 years earlier is virtually
indecipherable these days. Go back any further and you are effectively
looking at another language.

Some believe that the future will bring an even faster rate of change to
the English language, with new technologies giving us new concepts to
think about and label, and the ever wider spread of English as a second
language leading to an increase in loan words.

A great help in clarifying your usage of terms is the inclusion of a
glossary. For example, I could write:


  name
  
A name is a label for a noun, (human or animal,
thing, place, product [as in a brand name] and even an
idea or concept), normally used to distinguish one from
another.
(http://en.wikipedia.org/wiki/Name";>source)
  


With RDFa, the idea of a glossary can be used to reduce our reliance on
external vocabularies:

http://xmlns.com/foaf/0.1/";
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#";>
  name
  
A name is a label for a noun, (human or animal,
thing, place, product [as in a brand name] and even an
idea or concept), normally used to distinguish one from
another.
(http://en.wikipedia.org/wiki/Name";>source)
  


This doesn't completely eliminate the risk, but goes a long way to
mitigating it.

Anyway, that's enough on internal/external data. A few more specific
points...

> The reduced number of attributes in CRDF is not aimed to deal with
> complexity; but with a separate issue: it is easier for a host
> language to add a rel value for s and an extra attribute with no
> predefined name, than the bunch of attributes RDFa defines.

Not just an extra rel value for , but in some languages it would
involve introducing the  element to begin with. The cost of
introducing a new element is significantly higher than new attributes,
given that in most implementations of XML-like languages, unknown
attributes are generally ignored.

> Actually,
> there have been some complains [1] about why should HTML5 restraint
> itself from using quite useful attribute names such as "content" or
> "resource", just because RDFa decided to use them, without g

Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-21 Thread Eduard Pascual
On Wed, May 20, 2009 at 6:56 PM, Toby Inkster  wrote:
> Given that one of the objections people cite with RDFa is complexity,
> I'm not sure how this resolves things. It seems twice as complicated to
> me. It creates fewer new attributes, true, but number of attributes
> themselves don't create much confusion.
The reduced number of attributes in CRDF is not aimed to deal with
complexity; but with a separate issue: it is easier for a host
language to add a rel value for s and an extra attribute with no
predefined name, than the bunch of attributes RDFa defines. Actually,
there have been some complains [1] about why should HTML5 restraint
itself from using quite useful attribute names such as "content" or
"resource", just because RDFa decided to use them, without giving
non-X HTML a thought.
In other words: currently, RDFa parsers should have enough to ignore
non-X HTML content (or, more specifically, documents with no default
xmlns in , so they can also cope with the XHTML1.1+RDFa served
as text/html aberration, which is wrong no matter how you look at it).
If RDFa was taken into HTML5, then parsers should also care about
non-X documents, which binds HTML to not use these attribute names for
any future extension (actually, as pointed on Ian's mail referenced
above, @content is already used on  since HTML4, so this can't
even be fulfilled).
CRDF takes a *less intrussive* approach: it minimizes number of
attributes, and even lets the host language to chose the name for
them; with the only requirement of them being defined in the spec as
"CRDF inline stuff" (the document suggest one wording for this, but
explicitly allows for any equivalent wording).
The goal of the fewer attributes, hence, is not to be *simpler*, but
to be *less intrussive*; and the referenced mail is the main reason to
want to be so.

On the simplicity/complexity debate, I'll point out that the isn't a
goal to make things too much simpler with CRDF (although there is a
goal to not make them more complex than needed).
Also, keep in mind that the document is in a quite early stage: many
things are too vaguely defined yet, and will become clearer once it
matures. Again, let me insist that the goal of these early versions is
to describe the idea and concept, and to draw feedback about it: it is
*not* a spec, and there are many details that are just left implicit
or even undefined yet. I'll make sure to clearly state the design
goals of CRDF on the next iteration of the document, to avoid
confusion.

> e.g. which is a simpler syntax:
>
> http://foo.example.com/";
>   ping="http://tracker.example.com/";>Foo
>
> or:
>
> http://foo.example.com/');
>         secondary:url('http://tracker.example.com/');">Foo
Like Tab, I think this example is completely unrelated. Fortunately, I
could understand your point without it ;-)

> Stuffing multiple discrete pieces of information makes things harder for
> parsing, harder for authoring tools and harder for authors.
Parsing a @crdf (or whatever it gets named) attribute shouldn't be
much harder than parsing a @style attribute. Furthermore, a good deal
of CSS parsing code can be reused to build CRDF parsers. Similarly,
authoring tools that already handle CSS styling may reuse a good deal
of the code to enable handling of CRDF metadata; and authors may apply
most concepts from CSS (such as Selectors, or the property:value
syntax) to CRDF.
> In RDFa, each attribute performs a simple role - e.g. @rel specifies the
> relationship between two resources; @rev specifies the relationship in
> the reverse direction; @content allows you to override the
> human-readable text of an element. Combining these into a single
> attribute would not make things simpler.
Nop, it doesn't. It doesn't have to make things too much more complex
either. But it makes the format easier to integrate in the host
language, since it requires less changes and such changes are more
flexible.

Now, let's speak about simplicity: CRDF gives you two ways to define
the values of properties (or, actually, two variants of the same way):
the CSSish property:value syntax and the short-hand syntax that just
omits the value and defaults it to the "contents" keyword. RDFa can be
defining a value with href, or with contents, or have it implicit;
which depends on whether the rel/rev or property attributes are used,
and whether the contents attribute is present or not. That makes three
ways of defining property values, which depend on up to four different
attributes, against CRDF two ways which only vary on the value being
given or not.
Next, RDFa may be defining the property itself on @property, on @rel,
or on @rev; while CRDF always define them the same way.
For subjects, OMG, both @about and @src may be defining them, or they
may be "inherited" from parent elements; while on CSS they are always
defined via @|subject or inherited through the well-defined CSS
cascading rules (actually, they may also be re-defined for "reversed"
properties, but that part is bei

Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-20 Thread Toby A Inkster

On 20 May 2009, at 23:10, Tab Atkins Jr. wrote:

Stuffing multiple discrete pieces of information makes things  
harder for

parsing, harder for authoring tools and harder for authors. In RDFa,
each attribute performs a simple role - e.g. @rel specifies the
relationship between two resources; @rev specifies the  
relationship in

the reverse direction; @content allows you to override the
human-readable text of an element. Combining these into a single
attribute would not make things simpler.


You're leaving out @about, @property, @resource, @datatype, @typeof,


All of which have similarly simple usages:

@about = sets the URI for the thing we're talking about
@property = specifies what property the element's text represents
@resource = provides a link which is the object of @rel / subject of  
@rev

@datatype = specifies the type of data for an element with @property
@typeof = specifies the type for a new resource


and numerous implicit uses of @href or @src,


@href == @resource (but at a lower priority, so latter can override)
@src == @about (but at a lower priority, so latter can override)


along with with implicit
chaining with contained nodes.  Please don't misrepresent the
simplicity of RDFa - it's a generic metadata extraction method, and is
rather complex.  So is CRDF, of course, but that's not disputed.


Each attribute is rather simple and has a simple syntax. Chaining  
them together becomes more complicated, I don't dispute that - but  
chaining together anything tends to increase complexity significantly  
(consider the implications of nested elements on onclick handling in  
Javascript - the result is event bubbling, which is hardly an easy  
concept for newcomers to Javascript).


But as each individual attribute is simple, and we can get some small  
gains without complex chaining, then basic uses of RDFa become pretty  
easy.


e.g.

http://example.com/license.html";>

Something that anyone can do to easily. Becoming familiar with simple  
cases will help them get to grips with how the attributes work, so  
they're more familiar if they feel the need to mark up more complex  
data.



(Also, the argument against @rev is still going strong - in the RDFa
in XHTML document, section 6.3.2.2, the foaf:img relation is misused
in @rev, causing the RDF to state that Mark is an image of the 
resource!  @rev really is too confusing for standard use - just add
inverted @rel values when necessary.)


Both usages of foaf:img in the RDFa in XHTML document seem to be  
correct. I think you may be thinking of Mark's draft RDFa tutorial.  
He explained on the RDFa task force that this was due to his  
misunderstanding foaf:img rather than misunderstanding @rel.


Indeed, FOAF has three different terms (img, depiction, depicts) for  
connecting an image to the thing depicted in the image, so it's not  
hard to get them mixed up. This is precisely why @rev is needed - to  
prevent having to define separate depicts/depiction, maker/made,  
primaryTopic/isPrimaryTopicOf terms. Having just one term to describe  
the relationship, and reversing the direction by moving it from @rel  
to @rev, makes vocabularies smaller and simpler.



We are going to have to massively disagree on this point.  ^_^  I love
CSS syntax.


So do I, but CRDF as defined is no more like CSS in terms of syntax  
than C or Perl are - they share the curly braces and semicolons, but  
not much else.



It is rarely, if ever, necessary to set multiple  elements to the
same @src or @alt.


I'm thinking of things like a table which has a check-mark column  
with a green tick image repeated all the way down, or a traffic-light  
indicator column with red, green and perhaps amber images indicating  
different statuses. I quite often see such things in web applications.


--
Toby A Inkster







Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-20 Thread Tab Atkins Jr.
On Wed, May 20, 2009 at 11:56 AM, Toby Inkster  wrote:
> Given that one of the objections people cite with RDFa is complexity,
> I'm not sure how this resolves things. It seems twice as complicated to
> me. It creates fewer new attributes, true, but number of attributes
> themselves don't create much confusion.
>
> e.g. which is a simpler syntax:
>
> http://foo.example.com/";
>   ping="http://tracker.example.com/";>Foo
>
> or:
>
> http://foo.example.com/');
>         secondary:url('http://tracker.example.com/');">Foo

I'm not sure how this example is relevant.  Links do one thing and do
it visibly; they benefit from a simple, straightforward syntax and a
proliferation of attributes that have direct meaning.  Any metadata
proposal, on the other hand, has attributes which acquire meaning only
through their values and the vocab being used, and there is a
necessary degree of indirection which makes things more difficult.

It would have actually been useful had the comparison been between
pseudo-CRDF and pseudo-RDFa, or better yet, actual CRDF and RDFa.
That way we have two things which can actually be compared.

> Stuffing multiple discrete pieces of information makes things harder for
> parsing, harder for authoring tools and harder for authors. In RDFa,
> each attribute performs a simple role - e.g. @rel specifies the
> relationship between two resources; @rev specifies the relationship in
> the reverse direction; @content allows you to override the
> human-readable text of an element. Combining these into a single
> attribute would not make things simpler.

You're leaving out @about, @property, @resource, @datatype, @typeof,
and numerous implicit uses of @href or @src, along with with implicit
chaining with contained nodes.  Please don't misrepresent the
simplicity of RDFa - it's a generic metadata extraction method, and is
rather complex.  So is CRDF, of course, but that's not disputed.

(Also, the argument against @rev is still going strong - in the RDFa
in XHTML document, section 6.3.2.2, the foaf:img relation is misused
in @rev, causing the RDF to state that Mark is an image of the 
resource!  @rev really is too confusing for standard use - just add
inverted @rel values when necessary.)

> Looking at the comparison given in section 4.2, CRDF appears to suffer
> from several disadvantages compared to RDFa:
>
> 1. It's pretty ugly.

We are going to have to massively disagree on this point.  ^_^  I love
CSS syntax.  It's small, elegant, and simple.  CRDF benefits from all
of this.  Inline CRDF isn't ideal, but it benefits from being
identical to standard CRDF syntax, as well as resembling inline CSS in
@style.

RDFa (and Microdata, to a lesser extent), on the other hand, look like
you invented a half-dozen versions of @style which all do something
different but all have to be used together to style your document.  To
me it looks like the uneditted HTML that Microsoft products will spew
out if you let them.

So, I guess beauty is in the eye of the beholder.  ^_^

> 2. It's more verbose - though only by eleven bytes by my reckoning, so
> this isn't a major issue.

When used inline, it may be.  It's not *intended* to be used inline,
though - that's just there for the occasional case when you absolutely
need to do so, just as @style is available but discouraged in favor of
external CSS.

When used as intended, as a separate CRDF file, you see immediate
savings as soon as you have two things with the same data structure.
I think I'm reasonable in assuming that most users of any metadata
solution will be doing so in medium-to-large quantities, not
individual isolated instances with unique structure.  They can deploy
a single CRDF file across their entire site, automatically allowing
metadata extraction from their content with no further effort.  At
worst, they have to add a few classes, perhaps some s.

> 3. It divorces the CURIE prefix definitions from the use of CURIEs in
> the markup. This makes it more vulnerable to copy-paste problems. (As I
> understand  in the proposal, CURIE prefix
> definitions can even be separated out into an external file. This
> obscures them greatly and will certainly be a cause of copy-paste
> issues!)

If you're using inline CRDF, then yeah, the prefix definitions may be
far from the content.  The prefixes are defined globally for the
document, and may appear anywhere.  In practice, inline CRDF should be
rare, and the prefixes should appear at the top of the .crdf file
where they can be easily seen.

> Apart from the fact that *sometimes* RDFa involves a bit of repetition,
> I don't see what problems this proposal is actually supposed to solve.

You're being disingenuous.  RDFa *always* requires *large* amounts of
verbose repetition whenever you're indicating the same metadata
structure multiple times.  I expect that this type of use will be by
far the most common if metadata embedding takes off as hoped for.
One-shot uses like on bios will be relatively rare (I expect most
meta

Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-20 Thread Toby Inkster
Given that one of the objections people cite with RDFa is complexity,
I'm not sure how this resolves things. It seems twice as complicated to
me. It creates fewer new attributes, true, but number of attributes
themselves don't create much confusion.

e.g. which is a simpler syntax:

http://foo.example.com/";
   ping="http://tracker.example.com/";>Foo

or:

http://foo.example.com/');
 secondary:url('http://tracker.example.com/');">Foo

Stuffing multiple discrete pieces of information makes things harder for
parsing, harder for authoring tools and harder for authors. In RDFa,
each attribute performs a simple role - e.g. @rel specifies the
relationship between two resources; @rev specifies the relationship in
the reverse direction; @content allows you to override the
human-readable text of an element. Combining these into a single
attribute would not make things simpler.

Looking at the comparison given in section 4.2, CRDF appears to suffer
from several disadvantages compared to RDFa:

1. It's pretty ugly.

2. It's more verbose - though only by eleven bytes by my reckoning, so
this isn't a major issue.

3. It divorces the CURIE prefix definitions from the use of CURIEs in
the markup. This makes it more vulnerable to copy-paste problems. (As I
understand  in the proposal, CURIE prefix
definitions can even be separated out into an external file. This
obscures them greatly and will certainly be a cause of copy-paste
issues!)

4. It's ugly. I'm sorry, I just can't emphasise that enough.

Apart from the fact that *sometimes* RDFa involves a bit of repetition,
I don't see what problems this proposal is actually supposed to solve.

Repetition in practise seems to be something that page authors can deal
with. We don't provide a mechanism for setting the src or alt attributes
of multiple  elements which need to load the external image; or
setting the class attribute of the third cell in every row of a table.

So again, while I can see that this proposal would "work", in what way
is it supposed to be preferable to RDFa?

-- 
Toby Inkster 


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-20 Thread Eduard Pascual
Note: I wrote this yesterday. My internet connection wasn't working as
desirable, but GMail told me it had been sent and I believed it. Now I
have just noticed that it hadn't; and at least one person has been
confused by the changes in the document. Sorry for this issue, and
hope this time GMail does send it. What follows is the message as it
should have been sent yesterday:

Update: I have just put up a new version of the CRDF document. The
main changes are:
Section 0. Rationale: several corrections on the claimed limitations
of RDFa, which have been shown to be just limitations of my knowledge
about RDFa.
Section 2. Syntax: the syntax is now more formally defined (although
it still refers to CSS3's Syntax, Values, and Namespace modules for
some stuff). The content model for property values is now fully
defined: resource and "reversed" support has been added, and explicit
typing capabilities are now more prominent in the document. For
subject definitions, the "none" keyword has been redefined; "blank()"
now handles what "none" previously did, and a syntax has been added to
mimic EASE's nearest-ancestor construct. Finally, a subsection has
been added describing how to handle escenarios where a tool might have
to extract an "XML literal" from source in a non-XML language.
Section 3. The host language: expanded 3.3 (embedding inline CRDF) to
allow multiple brace-delimited blocks within the attribute value, to
enable stating properties for different subjects while reusing the
same element.
Section 4. The first examples don't make sense anymore after the
changes in section 0. They have been removed, waiting for further
feedback on that section before redoing them.



I'd like to reiterate what I said in the opening message: if someone
can suggest of a better place to discuss this document, please let me
know.


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-17 Thread Eduard Pascual
First of all, thanks for the time taken to review the document and to
post your feedback. I truly appreciate it.

On Sat, May 16, 2009 at 2:12 PM, Toby A Inkster  wrote:
> In part 0.1 you include some HTML and some RDF triples that you'd like to
> mark up in the HTML and conclude that RDFa is incapable of doing that
> without adding extra wrapper elements.
>
> While adding redundant wrapper elements and empty elements is occasionally
> needed in RDFa (and from what I can tell, the microdata approach is even
> worse at this), the example you give doesn't require any.
I think I already stated this somewhere, but it never hurts to state
it again: as any human, I can make mistakes; and my knowledge about
RDF, RDFa, and even CSS, is definitely far from perfect. So, thanks
for your post that has actually improved it a little, with the
"revelation" that @property can take multiple values. My apologies for
that wrong example, then, I'll try to fix that part ASAP. Trying to
think about which cases would then require wrappers in RDFa, the only
situation I've come up with is when the value should be reused for
properties about different subjects. And, to my surprise, just
realized that CRDF in embedded form didn't handle those case neither!
So, my most sincere thanks for highlighting this, since you have
revealed a serious issue on CRDF that will get fixed on the next
iteration of the document (hopefully due for late tuesday or early
wednesday).

> Part 0.3 of your document claims that RDFa is designed for XHTML
> "exclusively". This is not the case - the designers of RDFa went out of
> their way to make its use feasible in *any* XML or XML-like language. SVG
> Tiny 1.2 includes the RDFa attributes, so RDFa can be used in SVG.
My apologies here for such a bad wording, although your reply confirms
the idea behind the wording: RDFa was part of the "the future is XML"
dream, thus not taking into propper account non-X HTML. Not to say
that it was the RDFa's fault, since that was a quite widespread belief
(I shared it myself for a long while). But RDFa's XMLish approach is
the root of many issues for tag-soup HTML; perfectly illustrated by
the ammount of controversy generated on these lists by the
"xmlns:prefix" syntax.
I'll make sure to change that wording to better describe the idea
behind it; and I'd like to thank you for highlighting the issue.

> Part 0.3 also states that "both Microformats and RDFa require the
> human-readable values to be reused as the machine-
> readable ones.". Actually, RDFa provides @content and @resource which,
> respectively, over-ride human-readable text and human-intended link targets.
Again, my limited knowledge of RDFa has betrayed me. This, added to
Microformats missuse of abbr as a workaround, means that the issue
itself doesn't exist, at least not as initially percevied. I'm not
sure whether I'll remove that one entirely, or just briefly mention on
the "Issues with Microformats" section, due to the accessibility
issues with the abbr approach.

> Lastly, and most seriously, CRDF doesn't seem to distinguish between
> literals and resources.
This is definitely an important issue, which Tab already made me aware
of. Fortunately, it's easy to fix; and Tab himself provided a possible
solution, which is very likely to be part of the next version of the
document.

Until I add the fixes to the document, it's only left to reiterate my
thanks for your feedback.

Regards,
Eduard Pascual


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-16 Thread Tab Atkins Jr.
(Could you try to be a little more careful about changing mail titles?
 These threads have splintered into half a dozen separate things in my
mail reader due to "Re:"s appearing in subjects.  It took me a while
to discover just what mail you were trying to respond to here.)

On Sat, May 16, 2009 at 7:12 AM, Toby A Inkster  wrote:
[snip a lot of more editorial comments]
> Lastly, and most seriously, CRDF doesn't seem to distinguish between
> literals and resources. For example, with CRDF, I can do:
>
> http://example.net/"; />
> 
> @namespace ex "http://example.com/";
> a.foo {
>  ex|property1: attr(title);
>  ex|property2: attr(href);
> }
> 
> http://example.org/"; title="Quux">...
>
> And I'd expect it to generate the following RDF/XML:
>
> http://example.net/";>
>  Quux
>  http://example.org/"; />
> 
>
> But it is not clear why a parser should generate the above, and not:
>
> http://example.net/";>
>  http://example.net/Quux"; />
>  http://example.org/
> 
>
> And there is a big difference in what these two pieces of RDF/XML mean.

Actually, I believe it would generate:
http://example.net/";>
 Quux
 http://example.org/";


In other words, it completely ignores the resource part of RDF.  This
is easy to fix, though.  Frex, change the example CRDF to:


@namespace ex "http://example.com/";
a.foo {
 ex|property1: attr(title);
 ex|property2: attr(href) resource;
}


And it could then generate the first triple you posted.

~TJ


Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-16 Thread Toby A Inkster
In part 0.1 you include some HTML and some RDF triples that you'd  
like to mark up in the HTML and conclude that RDFa is incapable of  
doing that without adding extra wrapper elements.


While adding redundant wrapper elements and empty elements is  
occasionally needed in RDFa (and from what I can tell, the microdata  
approach is even worse at this), the example you give doesn't require  
any.


Thusly:


  
My homepage
  
  http://purl.org/dc/elements/1.1/";
xmlns:cc="http://creativecommons.org/ns#";>
Eduard Pascual's homepage
Someday I will put some content here!
This page, by Eduard Pascual,
is licensed under a http://creativecommons.org/licenses/by/3.0/";
>CC Attribution license.
  


Part 0.3 of your document claims that RDFa is designed for XHTML  
"exclusively". This is not the case - the designers of RDFa went out  
of their way to make its use feasible in *any* XML or XML-like  
language. SVG Tiny 1.2 includes the RDFa attributes, so RDFa can be  
used in SVG.


Part 0.3 also states that "both Microformats and RDFa require the  
human-readable values to be reused as the machine-
readable ones.". Actually, RDFa provides @content and @resource  
which, respectively, over-ride human-readable text and human-intended  
link targets.


e.g.

http://xmlns.com/foaf/0.1/";
   typeof="foaf:Person">
  Ian Hickson's
  nickname is H to the I to the X to the I to the E (as a
  Gangsta rapper might put it.


Lastly, and most seriously, CRDF doesn't seem to distinguish between  
literals and resources. For example, with CRDF, I can do:


http://example.net/"; />

@namespace ex "http://example.com/";
a.foo {
  ex|property1: attr(title);
  ex|property2: attr(href);
}

http://example.org/"; title="Quux">...

And I'd expect it to generate the following RDF/XML:

http://example.net/";>
  Quux
  http://example.org/"; />


But it is not clear why a parser should generate the above, and not:

http://example.net/";>
  http://example.net/Quux"; />
  http://example.org/


And there is a big difference in what these two pieces of RDF/XML mean.

--
Toby A Inkster





Re: [whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-15 Thread Tab Atkins Jr.
On Thu, May 14, 2009 at 9:50 AM, Eduard Pascual  wrote:
> I have put online a document that describes my idea/proposal for a
> selector-based solution to metadata.
> The document can be found at http://herenvardo.googlepages.com/CRDF.pdf
> Feel free to copy and/or link the file wherever you deem appropriate.
>
> Needless to say, feedback and constructive criticism to the proposal
> is always welcome.
> (Note: if discussion about this proposal should take place somewhere
> else, please let me know.)

Ah, thanks Eduard.  Have you cleaned this up significantly since the
last time this discussion came up?  It seems to read much better now
than before, but it's possible that I was just stupider several months
ago.

As far as I can tell (I am a novice, so YMMV), it conveys everything
that RDFa does, and more specifically, matches RDF-EASE's features.  I
think it has a friendly syntax than RDF-EASE, though, which I think is
tied too much to the exact structure of RDFa.  The author does
acknowledge that he leans directly on RDFa, but I think that's a
mistake - RDFa is designed to deal with the limitations of the
attr/value pairs that you can place on elements.  When you're
designing a new language by itself, you can employ the magic of
syntactic sugar to tighten things up and make them easier and more
expressive.

Frex, RDF-EASE uses -rdf-property to specify what property something
should be, and -rdf-content to specify whether a property should take
its value from the element's content or from an attribute.  This split
is necessary when embedding attributes in HTML, but your proposal
combines those two things into a single line, which I think is much
clearer, and makes it easier to use when specifying multiple
properties.  (Not to mention making inline specification even easier
than RDFa, as you point out.)

I recommend using 'self' as the value for @|subject that corresponds
to a blank node for each matched element.

How would you write the situation where you have two vocabs applying
to content in an intertwined way, with different subjects?  I can't
think of an explicit example right now, but say you had content like
, where  and  are both subjects
using different vocabs, and  has facts about both of them.  It
seems like you can handle this by specifying two separate blocks with
an identical selector but different @|subject rules.  Is this correct?

If so, it seems then that at least one of those @|subject rules would
require either a url(...) or blank(...) value, which limits ones
ability to use this technique on multiple elements on a page.
RDF-EASE uses the nearest-ancestor(selector) functional notation to
indicate these sorts of relationships.

(Ah, here we go, an example:
http://buzzword.org.uk/2008/rdf-ease/spec#ssec-properties--rdf-about
talks about mixing foaf and vcard together, with one scenario matching
what I outlined earlier.)


Your proposal doesn't seem to have a way to specify the datatype
currently.  Since several people have brought up the lack of datatype
as a weakness in Ian's microdata proposal, this may be a weakness.


RDF-EASE allows you to 'reset' elements, *overriding* metadata given
by less-specific selectors rather than just augmenting it.  This does
seem like a nice ability, specifically when you need to provide a
general rule for a particular class, say, and give a slightly
different rule for one of those elements with a particular id.  On the
other hand, you can just write the general rule with :not() to avoid
the more specific element.  I'm not sure whether this is good enough,
or if it really is easier to use something like 'reset'.

~TJ


[whatwg] A Selector-based metadata proposal (was: Annotating structured data that HTML has no semantics for)

2009-05-14 Thread Eduard Pascual
I have put online a document that describes my idea/proposal for a
selector-based solution to metadata.
The document can be found at http://herenvardo.googlepages.com/CRDF.pdf
Feel free to copy and/or link the file wherever you deem appropriate.

Needless to say, feedback and constructive criticism to the proposal
is always welcome.
(Note: if discussion about this proposal should take place somewhere
else, please let me know.)

Regards,
Eduard Pascual