Re: [whatwg] Annotating structured data that HTML has no semantics for
On Tue, 9 Jun 2009, Frank Hellenkamp wrote: I agree entirely. I actually tried to find a workable solution to address this but unfortunately the only general solutions I could come up with that would allow this were selector-based, and in practice authors are still having trouble understanding how to use Selectors even with CSS. At least simple selectors are well understood and a well established technique on the web. Sure, but with tables, you can't use simple selectors. Simple selectors (e.g. a class attribute on each cell) wouldn't be any better than repeating itemprop= everywhere. There is widespread use for it in CSS (so it is very simple to test, if your selector works for the correct set of elements). It'd actually be quite hard to test a selector layer for microdata, relative at least to the testing that (say) CSS gets. The thing is, with CSS, if there's a mistake then the worst that will happen is that the rule will be ignored, or will apply in some way you didn't realise, but with the end result being what you want. With microdata, if you get the rules wrong, you won't really know, until someone tries to apply the data in some way you didn't expect, and then it'll fail in ways you won't know about. And with a selector-based aproach it is far easier to add metadata-information to existing content, than with the metadata-proposal. So for authors it would be much easier, I think. It would work like a dezentralized microformats-approach (btw. it would be easy to map the existing microformats to such a css-based metadata-format), with the benefit that you can simply map your own classes and ids to global ones like foaf, dc or hcard. And you could easily use such profiles from other pages, e.g.: Someone could markup the songs on his page in a way last.fm does and then simply use a copy of their meta-data profile (basically in the same way we use microformats now). A selectors-based approach would be similar to GRDDL in this respect. I don't think this really needs support within HTML; I would encourage you to work on this as a stand-alone technology. There's also the problem with separating the data from the rules that say how to interpret the data, which would likely lead to more problems than the typos one would get from repeating the itemprop=s. The only real problem I see is the unfortunate fact, that it is harder for browser-implementors to write a good copy paste code which preserves all metadata from one source to another. That's one symptom of the above, yes. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Tue, 9 Jun 2009, Jonas Sicking wrote: Some of the improvement suggestions that I have heard that sounds interesting, though possibly for the next version of microdata. * Support for specifying a machine-readable value, such as for dates, colors, numbers, etc. I expect we will add support for these based on demand, the same way we added time in the first place. Using dedicated elements for each data type seems like it will eventually bloat the language. Only if people don't show restraint in extending the language. For example what use would a color element or a number element do? It would allow conformance checkers to do type checking for the most commonly used types, and might allow (for number, anyway) localised formatting. If instead mashine readable values could be added using a generic method, such as a 'itemvalue' or 'propvalue' attribute, each microdata format can define how to interpret the values, be they numbers, dates, body parts, or chemical formulas. You can do that now with meta itemprop=... content= I even wonder it would allow replacing the time element with a standardized microformat, such as: Christmas is going down on span item=w3c.time itemvalue=12-25-2009The 25th day of Decemberspan! I don't really understand how that would be better than dedicated elements. The idea would be to reduce the size of the language. I.e. if a feature isn't heavily used, it might be better expressed as a microdata format. Well, you can do it today as: Christmas is going down on meta itemprop=w3c.time content=12-25-2009The 25th day of December! ...which (assuming that in your example you meant itemprop and not item, and assuming that you didn't mean the contents of the span to have any effect on the microdata processing model) would result in exactly the same name/value pair being generated into the relevant item. On the other hand, if you really meant item=, which I guess you might have meant... you could do that today as: Christmas is going down on span item=w3c.timemeta itemprop=value content=12-25-2009The 25th day of December/span! ...or some such (it doesn't matter what the textual contents of the span are in this example). However, this is going to result in much more painful structures, and you'd still need to link the item with a parent item (assuming there is one), as in: div item=com.example.somethingorother Christmas is going down on span itemprop=com.example.startdate item=w3c.timemeta itemprop=value content=12-25-2009The 25th day of December/span! /div ...which is really getting complicated compared to just: div item=com.example.somethingorother Christmas is going down on meta itemprop=w3c.time content=12-25-2009The 25th day of December! /div ...or (preferred today): div item=com.example.somethingorother Christmas is going down on time itemprop=w3c.time datetime=12-25-2009The 25th day of December/time! /div For example, why didn't you add elements for bibtex or vCard, but instead used microdata? New elements didn't really fit the use cases as well. Another reason is as a test of the microdata feature itself. Microdata is a sort of extension mechanism to HTML 5. In software development, it is common to test your extension system by developing parts of the product using the extension system. This way you can both keep the core code small, and you get a good test bed for your extension system. Indeed. You have already done this with the predefined vocabularies Right. and apparently the lack of ability to define a mashine readable value separate from the human readable one was not a problem. However it would seem that the same does not hold true for time. Right, that's why I adapted time into the microdata model. * Support for tabular data. This would be nice if we can find a way to do it that doesn't put undue burdens on simple implementations. (e.g. I would imagine that while a microdata implementation today can be a few hundred lines total, adding support for the table model could easily double that.) Quite possibly. In both these cases I'm perfectly happy to wait with adding more features to microdata for now and see if what we have is successful, before we start over engineering it to cover every imaginable case. Agreed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Mon, 11 May 2009, Simon Pieters wrote: On Sun, 10 May 2009 12:32:34 +0200, Ian Hickson i...@hixie.ch wrote: Page 3: h2My Catsh2 dl dtSchrouml;dinger dd item=com.damowmow.cat meta property=com.damowmow.name content=Schrouml;dinger meta property=com.damowmow.age content=9 p property=com.damowmow.descOrange male. dtErwin dd item=com.damowmow.cat meta property=com.damowmow.name content=Lord Erwin meta property=com.damowmow.age content=3 p property=com.damowmow.descSiamese color-point. img property=com.damowmow.img alt= src=/images/erwin.jpeg /dl Given the microdata solution and this example, there is now a reason other than styling to introduce di, since here you duplicate the dt information in meta. dl di item=com.damowmow.cat dt property=com.damowmow.nameSchrouml;dinger dd meta property=com.damowmow.age content=9 p property=com.damowmow.descOrange male. /di ... The styling problem is discussed at http://forums.whatwg.org/viewtopic.php?t=47 Yeah, I noticed that. I agree that if it turns out that this is a common authoring pattern (and assuming we can work around the difficulties in adjusting the parser to handle this), we should probably introduce di after all. I intend to wait and see what happens first though. On Mon, 11 May 2009, Giovanni Gentili wrote: Ian Hickson: � USE CASE: Annotate structured data that HTML has no semantics for, and � which nobody has annotated before, and may never again, for private use or � use in a small self-contained community. (..) � SCENARIOS: Between the scenarios should be considered also this case: * a user (or groups of users) wants to annotate items present on a generic web page with additional properties in a certain vocabulary. for example Joe wants to gather in a blog a series of personal annotation to movies (or other type of items) present in imdb.com. This isn't really a use case, it's a solution. What is the end-user scenario that the author is trying to address? For example, what kind of software will collect this information? What problem are we solving? a) In the case of properties specified for element without ancestor with an item attribute specified the corresponding item should be the document? (element body with implicit item attribute). We already have mechanisms for providing name-value pairs for a document; namely, meta name and link rel. b) Do we need to require UA to offer a standard way to visualize (at least as an option left to the user) the structured information carried in microdata ? Not as far as I can tell; what use case would this be for? And copypaste? The spec already requires user agents to include microdata in copy and paste. On Tue, 12 May 2009, Tim Tepa�e wrote: (Note the metas in the last example -- since sometimes the information isn't visible, rather than requiring that people put it in and hide it with display:none, which has a rather poor accessibility story, I figured we could just allow meta anywhere, if it has a property= attribute.) That seems to be a solution optimised for extremely invisible metadata but not for metadata which differs from the human visible data. It handles both -- instead of: span itemprop=xy/span ...you can do: spanmeta itemprop=x content=yz/span Imagine as an example the simple act of marking up a number (and ignoring what the number denotes). For human consumption a thousands seperator is often used, the type of seperator differs by language, locale and context. Just in my little word I see on regular basis the point, the comma, the space, the thin space and sometimes the the apostrophe. Parsing different representations of numbers would be a chore. The value of textContent of the element span itemprop=com.example.price�nbsp;1thinsp;000thinsp;000,mdash;/span is clearly unusable, demanding an additional invisible meta property=com.example.price content=100. Right. My irritation lies in the element proliferation, requiring one element/ attribute combination for machines, one element/text content combination for humans. Of course, any sane author would arrange both elements in a close relation, as parent/child or sibling but there would be still two different elements to maintain, leading to a higher cognitive load. Not just for authors but also for programmers: a fluctating price had to be actualized on two different elements; tree walking DOM scripts had to take meta-Elements in account. Furthermore it clashes with the familiar habit of other elements in HTML. A hyperlink is one element with a machine-readable attribute and human- readable text content. A citation is one element with a machine-readable reference and human-readable text content. The same model is used in meter, progress, time, abbr ... but not in user-defined
Re: [whatwg] Annotating structured data that HTML has no semantics for
Some of the improvement suggestions that I have heard that sounds interesting, though possibly for the next version of microdata. * Support for specifying a machine-readable value, such as for dates, colors, numbers, etc. I expect we will add support for these based on demand, the same way we added time in the first place. Using dedicated elements for each data type seems like it will eventually bloat the language. For example what use would a color element or a number element do? If instead mashine readable values could be added using a generic method, such as a 'itemvalue' or 'propvalue' attribute, each microdata format can define how to interpret the values, be they numbers, dates, body parts, or chemical formulas. I even wonder it would allow replacing the time element with a standardized microformat, such as: Christmas is going down on span item=w3c.time itemvalue=12-25-2009The 25th day of Decemberspan! I don't really understand how that would be better than dedicated elements. The idea would be to reduce the size of the language. I.e. if a feature isn't heavily used, it might be better expressed as a microdata format. For example, why didn't you add elements for bibtex or vCard, but instead used microdata? However, it's quite possible that time is going to be commonly used enough that it's worth using an element rather than a microdata format. Another reason is as a test of the microdata feature itself. Microdata is a sort of extension mechanism to HTML 5. In software development, it is common to test your extension system by developing parts of the product using the extension system. This way you can both keep the core code small, and you get a good test bed for your extension system. You have already done this with the predefined vocabularies, and apparently the lack of ability to define a mashine readable value separate from the human readable one was not a problem. However it would seem that the same does not hold true for time. * Support for tabular data. This would be nice if we can find a way to do it that doesn't put undue burdens on simple implementations. (e.g. I would imagine that while a microdata implementation today can be a few hundred lines total, adding support for the table model could easily double that.) Quite possibly. In both these cases I'm perfectly happy to wait with adding more features to microdata for now and see if what we have is successful, before we start over engineering it to cover every imaginable case. / Jonas
Re: [whatwg] Annotating structured data that HTML has no semantics for
The problem of W3C DTD DDoS does not apply to CURIE because software processing RDF does not need to retrieve the resources referenced on a regular basis. Even in the case of DTD, the problem is that some software does not cache, not that some software tries to access it. IMHO, Chris
Re: [whatwg] Annotating structured data that HTML has no semantics for
Ian Hickson wrote: I agree entirely. I actually tried to find a workable solution to address this but unfortunately the only general solutions I could come up with that would allow this were selector-based, and in practice authors are still having trouble understanding how to use Selectors even with CSS. There's also the problem with separating the data from the rules that say how to interpret the data, which would likely lead to more problems than the typos one would get from repeating the itemprop=s. I am sorry, but I cannot agree on this one. At least simple selectors are well understood and a well established technique on the web. There is widespread use for it in CSS (so it is very simple to test, if your selector works for the correct set of elements). And the fact that jquery is *so* successful is based on jquery's capability to work with selectors in such an easy way – not the other way around. And with a selector-based aproach it is far easier to add metadata-information to existing content, than with the metadata-proposal. So for authors it would be much easier, I think. It would work like a dezentralized microformats-approach (btw. it would be easy to map the existing microformats to such a css-based metadata-format), with the benefit that you can simply map your own classes and ids to global ones like foaf, dc or hcard. And you could easily use such profiles from other pages, e.g.: Someone could markup the songs on his page in a way last.fm does and then simply use a copy of their meta-data profile (basically in the same way we use microformats now). The only real problem I see is the unfortunate fact, that it is harder for browser-implementors to write a good copy paste code which preserves all metadata from one source to another. Best regards Frank -- frank hellenkamp | interface designer solmsstraße 7 | 10961 berlin +49.30.49 78 20 70 | tel +49.173.70 55 781 | mbl +49.3212.100 35 22 | fax jo...@depagecms.net http://www.depagecms.net http://immerdasgleiche.de http://everydayisexactlythesame.net/ signature.asc Description: OpenPGP digital signature
Re: [whatwg] Annotating structured data that HTML has no semantics for
On May 14, 2009, at 23:52, Eduard Pascual wrote: On Thu, May 14, 2009 at 3:54 PM, Philip Taylor excors+wha...@gmail.com wrote: It doesn't matter one syntax or another. But if a syntax already exists (RDFa), building a new syntax should be properly justified. It was at the start of this thread: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html As of now, the only supposed benefit I have heard of for this syntax is that it avoids CURIEs... yet it replaces them with reversed domains?? Is that a benefit? There's no indirection. A decade of Namespaces in XML shows that both authors and implementors have trouble getting prefix-based indirection right. (If we were limited to reasoning about something that we don't have experience with yet, I might believe that people can't be too inept to use prefix-based indirection. However, a decade of actual evidence shows that actual behavior defies reasoning here and prefix-based indirection is something that both authors and implementors get wrong over and over again.) I have been a Java programmer for some years, and still find that convention absurd, horrible, and annoying. I'll agree that CURIEs are ugly, and maybe hard to understand, but reversed domains are equally ugly and hard to understand. Problems shared by CURIEs, URIs and reverse DNS names: * Long. * Identifiers outlive organization charts. Problems that reverse DNS names don't have but CURIEs and URIs do have: * http://; 7 characters of even extra length. * Affordance of dereferencability when mere identifier sementics are meant. Problems that reverse DNS names and URIs don't have but CURIEs have: * Prefix-based indirection. * Violation of the DOM Consistency Design Principle if xmlns:foo used. (I understand that if the microdata syntax offered no advantages over RDFa, then it would be a wasted effort to diverge. Which are the advantages it offers? The syntax is simpler for the use cases it was designed for. It uses a simpler conceptual model (trees as opposed to graphs). It allows short token identifiers. It doesn't use prefix-based indirection. It doesn't violate the DOM Consistency Design Principle. On May 15, 2009, at 14:11, Eduard Pascual wrote: On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak m...@apple.com wrote: [...] From my cursory study, I think microdata could subsume many of the use cases of both microformats and RDFa. Maybe. But microformats and RDFa can handle *all* of these cases. Again, which are the benefits of creating something entirely new to replace what already exists while it can't even handle all the cases of what it is replacing? Compared to microformats, microdata defines the processing model and conformance criteria. The microformats community has failed to provide processing model and conformance criteria on similar level of detail. The processing model side is perceived to be such a serious issue that the lack of a unified microformats parsing spec is cited as a motivation to use RDFa instead of microformats. It seems to me that it avoids much of what microformats advocates find objectionable Could you specify, please? Do you mean anything else than WHATWG's almost irrational hate toward CURIEs and everything that involves prefixes? RDFa uses a data model that is an overkill for the use cases. but at the same time it seems it can represent a full RDF data model. No, it *can't* represent a full RDF model: it has already been shown several times on this thread. That's a feature. Wait. Are you refering to microdata as an incremental improvement over RDFa?? IMO, it's rather a decremental enworsement. That depends on the point of view. I'm sensing two major points of view: 1) Graphs are more general than trees. Hence, being able to serialize graphs is better. 2) Graphs are more general than trees. Hence, graphs are harder to design UIs for, harder to traverse and harder for authors to grasp. Hence, if trees are enough to address use cases, we should only enable trees to be serialized. I subscribe to view #2, and it seems that trees are indeed enough for the use cases (that were stipulated by the pro-graph people!). - Microdata can't represent the full RDF data model (while RDFa can): some complex structures are just not expressable with microdata. That's not a use case. That's theoretical purity. - Microdata relies on reversed domains. While some people argue these to be better than CURIEs, they are equally horrendous for the average user, and have the additional disadvantage that they don't map to anything useful (if they map to something at all), while CURIEs map to the descriptions and/or definitions of what they represent. I consider it an advantage that reverse domains don't suggest that you should try dereferencing identifiers as if they were addresses. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Annotating structured data that HTML has no semantics for
Henri Sivonen wrote: There's no indirection. A decade of Namespaces in XML shows that both authors and implementors have trouble getting prefix-based indirection right. It's true that people get this wrong again and again. But it's also true that lots of developers understand it once for all, and then consistently get it right. The interesting question here is whether there's a better system. I have been a Java programmer for some years, and still find that convention absurd, horrible, and annoying. I'll agree that CURIEs are ugly, and maybe hard to understand, but reversed domains are equally ugly and hard to understand. Problems shared by CURIEs, URIs and reverse DNS names: * Long. * Identifiers outlive organization charts. That depends on the choice of the URI scheme. Problems that reverse DNS names don't have but CURIEs and URIs do have: * http://; 7 characters of even extra length. * Affordance of dereferencability when mere identifier sementics are meant. Again, that depends on the URI scheme. Problems that reverse DNS names and URIs don't have but CURIEs have: * Prefix-based indirection. HTML developers regularly have to deal with a much more complicated indirection mechanism (CSS). * Violation of the DOM Consistency Design Principle if xmlns:foo used. I think there is consensus that this is a drawback, but not about how significant this is. The syntax is simpler for the use cases it was designed for. It uses a simpler conceptual model (trees as opposed to graphs). It allows short token identifiers. It doesn't use prefix-based indirection. It doesn't violate the DOM Consistency Design Principle. (devil's advocate argument) - so how does the syntax behave for those use cases it *hasn't* been designed for? Compared to microformats, microdata defines the processing model and conformance criteria. The microformats community has failed to provide processing model and conformance criteria on similar level of detail. Indeed. The processing model side is perceived to be such a serious issue that the lack of a unified microformats parsing spec is cited as a motivation to use RDFa instead of microformats. Indeed. RDFa uses a data model that is an overkill for the use cases. It would be interesting to understand which use cases that RDFa can do are not supported by microdata (I don't understand enough about the subject to try myself), and whether the potential advantage of having a simpler model outweighs the disadvantage of not using network effects and creating a competing syntax. ... BR, Julian
Re: [whatwg] Annotating structured data that HTML has no semantics for
On May 18, 2009, at 12:18, Julian Reschke wrote: Henri Sivonen wrote: There's no indirection. A decade of Namespaces in XML shows that both authors and implementors have trouble getting prefix-based indirection right. It's true that people get this wrong again and again. But it's also true that lots of developers understand it once for all, and then consistently get it right. The interesting question here is whether there's a better system. 1) Centralized allocation of short names. 2) Prefixing a short name by (an abbreviation of) the name of the vocabulary, which makes the probability of collision negligible once the designer has googled to check the probable absence of public collisions at minting time (e.g. openid.delegate). I have been a Java programmer for some years, and still find that convention absurd, horrible, and annoying. I'll agree that CURIEs are ugly, and maybe hard to understand, but reversed domains are equally ugly and hard to understand. Problems shared by CURIEs, URIs and reverse DNS names: * Long. * Identifiers outlive organization charts. That depends on the choice of the URI scheme. I guess one could use e.g. data:,foo URIs as a namespace URI, but why not just use foo? Problems that reverse DNS names and URIs don't have but CURIEs have: * Prefix-based indirection. HTML developers regularly have to deal with a much more complicated indirection mechanism (CSS). This would be a persuasive argument if we were reasoning about a feature we don't have experience with yet. However, experience shows prefix-based indirection is too hard. If at the same time CSS isn't too hard, I just have to accept the evidence from the real world even if it defies reasoning. The syntax is simpler for the use cases it was designed for. It uses a simpler conceptual model (trees as opposed to graphs). It allows short token identifiers. It doesn't use prefix-based indirection. It doesn't violate the DOM Consistency Design Principle. (devil's advocate argument) - so how does the syntax behave for those use cases it *hasn't* been designed for? That's hard to test, because the use case search has been exhausted for the moment. It seems we'd need to wait to see new use cases to pop up. RDFa uses a data model that is an overkill for the use cases. It would be interesting to understand which use cases that RDFa can do are not supported by microdata (I don't understand enough about the subject to try myself), and whether the potential advantage of having a simpler model outweighs the disadvantage of not using network effects and creating a competing syntax. Are there use cases of RDFa that are currently known but that the call for use cases didn't turn up? Either @prefix or RDFa-profiles would break the network effects of the deployment of outside-of-REC RDFa-in-XHTML-as-text/html, so if breaking network effects is on the table in the form of @prefix and RDFa-profiles, I don't see why microdata wouldn't be on the table as far as network effects go. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Annotating structured data that HTML has no semantics for
Henri Sivonen wrote: The interesting question here is whether there's a better system. 1) Centralized allocation of short names. Sounds like urn: to me. Registry is defined in RFC 3406. 2) Prefixing a short name by (an abbreviation of) the name of the vocabulary, which makes the probability of collision negligible once the designer has googled to check the probable absence of public collisions at minting time (e.g. openid.delegate). Too fragile for disambiguation for my taste. That depends on the choice of the URI scheme. I guess one could use e.g. data:,foo URIs as a namespace URI, but why not just use foo? URI give you the choice of having something easily referenceable (if you want), or not. Problems that reverse DNS names and URIs don't have but CURIEs have: * Prefix-based indirection. HTML developers regularly have to deal with a much more complicated indirection mechanism (CSS). This would be a persuasive argument if we were reasoning about a feature we don't have experience with yet. However, experience shows prefix-based indirection is too hard. If at the same time CSS isn't too hard, I just have to accept the evidence from the real world even if it defies reasoning. No, I don't think we have evidence that prefix-based indirection is too hard. There are way to many people getting it right. ... Either @prefix or RDFa-profiles would break the network effects of the deployment of outside-of-REC RDFa-in-XHTML-as-text/html, so if breaking network effects is on the table in the form of @prefix and RDFa-profiles, I don't see why microdata wouldn't be on the table as far as network effects go. Introducing @prefix will be much simpler to deploy than introducing a completely different system. That being said, I do agree that the current situation is a mess, and that the RDFa-in-XHTML spec has created it. Given the current situation, the simplest possible solution probably is to live with it, and use xmlns declarations in HTML for the purpose of RDFa as well. BR, Julian
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen hsivo...@iki.fi wrote: On May 14, 2009, at 23:52, Eduard Pascual wrote: On Thu, May 14, 2009 at 3:54 PM, Philip Taylor excors+wha...@gmail.com wrote: It doesn't matter one syntax or another. But if a syntax already exists (RDFa), building a new syntax should be properly justified. It was at the start of this thread: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html Ian's initial message goes step by step through the creation of this new syntax; but does *not* mention at all *why* it was being created on the first place. The insight into the choices taken is indeed a good think, and I thank Ian for it; but he omitted to provide insight into the first choice taken: discarding the multiple options already available (not only Microformats and RDFa, but also other less discussed ones such as eRDF, EASE, etc). Sure, there has been a lot of discussion on this topic; and it's possible that the choice was taken as part of such discussions. In any case, I think Ian should have clearly stated the reasons to build a brand new solution when many others have been out for a while and users have been able to try and test them. Please keep in mind that I'm not critizicing the choice itself (at least, not now), but the lack of information and reasoning behind that choice. As of now, the only supposed benefit I have heard of for this syntax is that it avoids CURIEs... yet it replaces them with reversed domains?? Is that a benefit? There's no indirection. A decade of Namespaces in XML shows that both authors and implementors have trouble getting prefix-based indirection right. Really? I haven't seen any hint about that. Sure, there will be some people who have trouble understanding namespaces, just like there is some people who have trouble understanding why something like trtdfoo/tdtdbar/tr/td is wrong. Please, could you quote a source for that claim? I could also claim something like fifteen years of Java show that reversed domains are error-prone and harmful, and even argue about it; but this kind of arguments, without a serious analisis or study to back them, are completely meaningless and definitely subjective. (If we were limited to reasoning about something that we don't have experience with yet, I might believe that people can't be too inept to use prefix-based indirection. However, a decade of actual evidence shows that actual behavior defies reasoning here and prefix-based indirection is something that both authors and implementors get wrong over and over again.) Curious: you refer to a decade of actual evidence, but you fail to refer to any actual evidence. I'm eager to see that evidence; could you share it with us? Thank you. I have been a Java programmer for some years, and still find that convention absurd, horrible, and annoying. I'll agree that CURIEs are ugly, and maybe hard to understand, but reversed domains are equally ugly and hard to understand. Problems shared by CURIEs, URIs and reverse DNS names: * Long. * Identifiers outlive organization charts. Ehm. CURIEs ain't really long: the main point of prefixes is to make them as short as reasonably possible. Good identifiers outlive bad organization charts. Good organization outlives bad identifiers. Good organization and good identifier tend to outlive the context they are used in. Problems that reverse DNS names don't have but CURIEs and URIs do have: * http://; 7 characters of even extra length. * Affordance of dereferencability when mere identifier sementics are meant. A CURIE (at least as typed by an author) doesn't have the http://: it is a prefix, a colon, and whatever goes after it. Once resolved (ie: after replacing the prefix and colon by what the prefix represents) what you get is no longer a CURIE, but a URI like the ones you'd type in your browser or inside a link's href attribute. Derefercability is not a problem on itself: having more than what is strictly needed can be either irrelevant or an advantage, not a problem. Of course, it *may* be the cause of some actual problem, but in that case you should rather describe the problem itself, so it can be evaluated. Problems that reverse DNS names and URIs don't have but CURIEs have: * Prefix-based indirection. Indirection can't be taken as a problem when most currently used RDFa tools don't use it at all (which proves that they can work without relying on it). Sure, it's not as big an advantage as some may claim it to be. But the ability of indirection itself, even if not 100% guaranteed to work, it is an actual advantage. As a real world example, I have been able to learn about vocabularies I didn't know by following the links on prefix declarations in documents using them. * Violation of the DOM Consistency Design Principle if xmlns:foo used. *if* xmlns:foo is used. Very strong emphasis on the conditional, and on the multiple possibilities that have already been proposed to deal
Re: [whatwg] Annotating structured data that HTML has no semantics for
On May 18, 2009, at 6:05 AM, Eduard Pascual wrote: On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen hsivo...@iki.fi wrote: On May 14, 2009, at 23:52, Eduard Pascual wrote: On Thu, May 14, 2009 at 3:54 PM, Philip Taylor excors+wha...@gmail.com wrote: It doesn't matter one syntax or another. But if a syntax already exists (RDFa), building a new syntax should be properly justified. It was at the start of this thread: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html Ian's initial message goes step by step through the creation of this new syntax; but does *not* mention at all *why* it was being created on the first place. The insight into the choices taken is indeed a good think, and I thank Ian for it; but he omitted to provide insight into the first choice taken: discarding the multiple options already available (not only Microformats and RDFa, but also other less discussed ones such as eRDF, EASE, etc). I think Ian did explain why he discarded RDFa as an option. In the email linked above, Ian Hickson wrote: Another solution we could consider is RDFa: section typeof=d:cat xmlns:d=http://damowmow.com/; h1 property=d:nameHedral/h1 p property=d:descHedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly./p img src=hedral.jpeg alt= title=Hedral, age 18 months class=photo rel=d:img /section This unfortunately also has a number of problems. - it uses prefixes, which most authors simply do not understand, and which many implementors end up getting wrong (e.g. SearchMonkey hard-coded certain prefixes in its first implementation, Google's handling of RDF blocks for license declarations is all done with regular expressions instead of actually parsing the namespaces, etc). Even if implemented right, namespaces still lead to flaky copy-and-paste behaviour. - it sometimes uses rel= and sometimes uses property= and it's hard to know when to use one or the other. - it introduces much more power than is necessary to solve this problem. I believe Microformats were discarded as a solution because the proposed use case was as follows: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. But Microformats are only intended for widely used and generally agreed upon public vocabularies. The Microformats process is not applicable to private-use/small-community vocabularies. And Microformats define specific vocabularies, not a general way to add new kinds of semantic markup. I expect Microformats experts would agree with this assessment. So I think it is clear why neither Microformats or RDFa were seen as suitable solutions to the use case, even if the matter was addressed somewhat briefly. Regards, Maciej
Re: [whatwg] Annotating structured data that HTML has no semantics for
On May 18, 2009, at 16:05, Eduard Pascual wrote: On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen hsivo...@iki.fi wrote: (If we were limited to reasoning about something that we don't have experience with yet, I might believe that people can't be too inept to use prefix-based indirection. However, a decade of actual evidence shows that actual behavior defies reasoning here and prefix-based indirection is something that both authors and implementors get wrong over and over again.) Curious: you refer to a decade of actual evidence, but you fail to refer to any actual evidence. I'm eager to see that evidence; could you share it with us? Thank you. I thought everyone had seen the confusion. There are pointers at http://wiki.whatwg.org/wiki/Namespace_confusion The wiki page is less than a decade old, so it's length isn't quite that impressive. I have been a Java programmer for some years, and still find that convention absurd, horrible, and annoying. I'll agree that CURIEs are ugly, and maybe hard to understand, but reversed domains are equally ugly and hard to understand. Problems shared by CURIEs, URIs and reverse DNS names: * Long. * Identifiers outlive organization charts. Ehm. CURIEs ain't really long: the main point of prefixes is to make them as short as reasonably possible. You need to consider the length of the prefix declarations, too. Problems that reverse DNS names and URIs don't have but CURIEs have: * Prefix-based indirection. Indirection can't be taken as a problem when most currently used RDFa tools don't use it at all (which proves that they can work without relying on it). What do you mean? Current RDFa tools don't use prefixes? (I understand that if the microdata syntax offered no advantages over RDFa, then it would be a wasted effort to diverge. Which are the advantages it offers? The syntax is simpler for the use cases it was designed for. It uses a simpler conceptual model (trees as opposed to graphs). It allows short token identifiers. It doesn't use prefix-based indirection. It doesn't violate the DOM Consistency Design Principle. Ok, the syntax is simpler for a subset of the use cases; but it leaves entirely out the rest of use cases. What are the rest of the use cases? Why weren't they put forward when Hixie asked for use cases? The DOM Consistency again is not an advantage of the microdata syntax because this could have been fulfilled with other syntaxes as well. It's an advantage over RDFa-in-XHTML-served-as-text/html. It's not an advantage over microformats or may not be an advantage over a speculative yet undefined variation of RDFa. It seems to me that it avoids much of what microformats advocates find objectionable Could you specify, please? Do you mean anything else than WHATWG's almost irrational hate toward CURIEs and everything that involves prefixes? RDFa uses a data model that is an overkill for the use cases. Which use cases? http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-April/019374.html No, it *can't* represent a full RDF model: it has already been shown several times on this thread. That's a feature. What?? Being unable to deal with all the use cases is a feature?? Being simpler while addressing all the use cases is a feature. Wait. Are you refering to microdata as an incremental improvement over RDFa?? IMO, it's rather a decremental enworsement. That depends on the point of view. I'm sensing two major points of view: 1) Graphs are more general than trees. Hence, being able to serialize graphs is better. 2) Graphs are more general than trees. Hence, graphs are harder to design UIs for, harder to traverse and harder for authors to grasp. Hence, if trees are enough to address use cases, we should only enable trees to be serialized. ¬¬ Again, what's your basis to decide that trees are enough to address use cases?? Of course, they are enough to solve some use cases, but the convenience of dealing with just trees is not worth sacrificing the needs of those use cases you are arbirarily deciding to ignore. I don't see anything on http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-April/019374.html that doesn't boil down to trees or simple key-value pairs attached to an item. I subscribe to view #2, and it seems that trees are indeed enough for the use cases (that were stipulated by the pro-graph people!). - Microdata can't represent the full RDF data model (while RDFa can): some complex structures are just not expressable with microdata. That's not a use case. That's theoretical purity. It's not theoretical purity, it's something simpler: *extensibility*. And, with over two decades between versions of the specs, this is a strong requirement: if a problem is noticed after HTML5 becomes the standard, it's essential to be able to solve it without waiting 10 or 20 years for HTML6 to come out. Well, you have to commit to some bounds on
Re: [whatwg] Annotating structured data that HTML has no semantics
On Sat, May 16, 2009 at 10:02 AM, Leif Halvard Silli l...@malform.no wrote: [...] But may be, after all, it ain't so bad. It is good to have the opportunity. :-) This is the exactly the point (at least, IMO): RDFa may be quite good at embedding inline metadata, but can't deal at all with describing the semantics that are inherent to the structure. OTOH, EASE does quite the latter, but can't handle the former at all. That's why I was advocating for a solution that allows either approach, and even mixing both when appropriate. On a side note, about the idea of mixing CSS+EASE or CSS+CRDF or CSS+whatever: my PoV is that these *should* not be mixed; but any CSS-like semantic description would benefit from some foolproofing, ensuring that if an author puts CRDF this would get ignored by CSS parsers (and viceversa). In addition, CSS's error-handling rules make this kind of shielding relatively easy. OTOH, adding the semantic code as part of the CSS styling, or trying to consider this as part (or even as an extension) of the CSS language is wrong by definition: semantics is not styling; and we should try to make authors aware enough of the difference. Regards, Eduard Pascual
Re: [whatwg] Annotating structured data that HTML has no semantics
Tab Atkins Jr. On 09-05-15 22.15: On Wed, May 13, 2009 at 10:04 AM, Leif Halvard Silli Toby Inkster on Wed May 13 02:19:17 PDT 2009: Hear hear. Lets call it Cascading RDF Sheets. http://buzzword.org.uk/2008/rdf-ease/spec http://buzzword.org.uk/2008/rdf-ease/reactions RDFa is better though. What does 'better' mean in this context? Why and how? Because it is easier to process? But EASE seems more compatible with microformats, and is better in that sense. I'd also like clarification here. I dislike *all* of the inline metadata proposals to some degree, for the same reasons that I dislike inline @style and @onfoo handlers. A Selector-based way of applying semantics fits my theoretical needs much better. A possibly 10 year old use case where I think EASE - or GRDDL as such - should fit in: Shelley and Geoffrey reminded us that RSS 1.0 stands for RDF Site Summary 1.0. The W3 also uses RSS 1.0. for its feed[1]. The feed is generated via a profile transformation [2] that happens with XSLT. The profile defines the div class=item as news items (note the combination of element and class - as in EASE)- But the profile also implements particular rules for particular elements without looking at the @class. (E.g. each div class=item must contain h2 or h3, for example.) All in all, it sounds very similar to what the newer technology GRDDL does, since it is all happening based on a profile and some class names and specific element structures. And, this is possible to test with the W3 GRDDL service, which produces a feed that in fact, when you look with the right eyes, is the same as the published homepage feed[3]. If the microdata becomes part of the final version of HTML 5, then GRDDL (with or without EASE) will probably prosper, since it probably doesn't matter to GRDDL whether it looks into @class or @item, as long as the thing is part of the profile and the profiletransformation. (But it would be interesting if someone in the know could test if the triples would be the same, etc ...) And if so, then the introduction of microdata increases the need for @profile in HTML 5. [1] http://www.w3.org/2000/08/w3c-synd/home.rss [2] http://www.w3.org/2000/08/w3c-synd/ [3] http://www.w3.org/2007/08/grddl/?docAddr=http%3A%2F%2Fwww.w3.org%2Foutput=rdfxml I read all the reactions you pointed to. Some made the claim that EASE would move semantics out of the HTML file, and that microformats was better as it keeps the semantics inside the file. But I of course agree with you that EASE just underline/outline the semantics already in the file. Yup. The appropriate critique of separated metadata is that the *data* is moved out of the document, where it will inevitably decay compared to the live document. RDF-EASE keeps all the data stored in the live document, and merely specifies how to extract it. The only way you can lose data then is by changing the html structure itself, which is much less common than just changing the content. That the structure changes seldom, /could/ be a reason for using RDFa to store the meta info in the very element instead of using EASE (or even Dublin Core in meta elements in the head). OTOH, that the structure changes little, could also be something that /permits/ the use of GRDDL ... So it depends on how you see it. From the EASE draft: All properties in RDF-EASE begin with the string -rdf-, as per §4.1.2.1 Vendor-specific extensions in [CSS21]. This allows RDF-EASE and CSS to be safely mixed in one file, [...] I wonder why you think it is so important to be able to mix CSS and EASE. It seems better to separate the two completely. I'm not thrilled with the mixture of CSS and metadata either. Just because it uses Selectors doesn't mean it needs to be specifiable alongside CSS. jQuery uses Selectors too, but it stays where it belongs. ^_^ (That being said, there's a plugin for it that allows you to specify js in your CSS, and it gets applied to the matching elements from the block's selector.) But may be, after all, it ain't so bad. It is good to have the opportunity. :-) (Since you, as I perceived it, disagreed with yourself above, I continue the tradition.) :-) -- leif halvard silli
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak m...@apple.com wrote: [...] From my cursory study, I think microdata could subsume many of the use cases of both microformats and RDFa. Maybe. But microformats and RDFa can handle *all* of these cases. Again, which are the benefits of creating something entirely new to replace what already exists while it can't even handle all the cases of what it is replacing? Both the new syntax, and the cases restrictions, are costs: what are these costs buying? If it's not clear what we are getting for these costs, it is impossible to evaluate whether the costs are worth it or not. It seems to me that it avoids much of what microformats advocates find objectionable Could you specify, please? Do you mean anything else than WHATWG's almost irrational hate toward CURIEs and everything that involves prefixes? but at the same time it seems it can represent a full RDF data model. No, it *can't* represent a full RDF model: it has already been shown several times on this thread. Thus, I think we have the potential to get one solution that works for everyone. RDFa itself doesn't work for everyone; but microdata is even more restricted: it leaves out the cases that RDFa leaves out, but it also leaves out some cases that RDFa was able to handle. So, where do you see such potential? I'm not 100% sure microdata can really achieve this, but I think making the attempt is a positive step. What do you mean by making the attempt? If there is something microdata can't handle, it won't be able to handle it without changing the spec. If you meant that evolving that microdata proposal towards something that works for everyone is a positive step, then I agree; but if you meant that engraving this microdata approach into the spec and set it into stone, then attempt for everyone to accept it, then I definitelly disagree. So, please, could you clarify the meaning of that statement? Thanks. One other detail that it seems not many people have picked up on yet is that microdata proposes a DOM API to extract microdata-based info from a live document on the client side. In my opinion this is huge and has the potential to greatly increase author interest in semantic markup. Allright, an API may be a benefit. Most probably it is. However, a similar API could have been built from RDFa, or eRDF, or EASE, or any other already existing or new solution; so it doesn't justify creating a new syntax. I have to insist: which are the benefits from such built-from-the-ground, restrictive *syntax*? That's what we need to know to evaluate it against its costs. Now, it may be that microdata will ultimately fail, either because it is outcompeted by RDFa, or because not enough people care about semantic markup, or whatever. But at least for now, I don't see a reason to strangle it in the cradle. At least for now, I don't see a reason why it was created to begin with. Maybe if somebody could enlighten us with this detail, this discussion could evolve into something more useful and productive. On Fri, May 15, 2009 at 6:53 AM, Maciej Stachowiak m...@apple.com wrote: On May 14, 2009, at 1:30 PM, Shelley Powers wrote: So, if I'm pushing for RDFa, it's not because I want to win. It's because I have things I want to do now, and I would like to make sure have a reasonable chance of working a couple of years in the future. And yeah, once SVG is in HTML5, and RDFa can work with HTML5, maybe I wouldn't mind giving old HTML a try again. Lord knows I'd like to user ampersands again. It sounds like your argument comes down to this: you have personally invested in RDFa, therefore having a competing technology is bad, regardless of the technical merits. Pause, please. Before going on, I need to ask again: which are those technical merits?? I don't mean to parody here - I am somewhat sympathetic to this line of argument. I think I'm interpreting Shelley's argument slightly differently. She didn't chose RDFa because it was better than microdata. She chose RDFa because it was better than other options, and microdata didn't even exist yet. Now microdata comes out, some drawbacks are highlighted in comparison with RDFa (lack of typing, inability to depict the full RDF model, Reversed domains are as ugly as CURIEs (but at least CURIEs resolve to something useful, while reversed domains often don't resolve at all), and you ask RDFa proponents to give microdata a chance, to not strangle it in the cradle; but nobody seems willing to answer the one question: what does microdata provide to make up for its drawbacks? Often pragmatic concerns mean that an incremental improvement just isn't worth the cost of switching Wait. Are you refering to microdata as an incremental improvement over RDFa?? IMO, it's rather a decremental enworsement. My personally judgment is that we're not past the point of no return on data embedding. There's microformats, RDFa, and then dozens of other serializations of
Re: [whatwg] Annotating structured data that HTML has no semantics for
Maciej Stachowiak wrote: On May 14, 2009, at 1:30 PM, Shelley Powers wrote: So, if I'm pushing for RDFa, it's not because I want to win. It's because I have things I want to do now, and I would like to make sure have a reasonable chance of working a couple of years in the future. And yeah, once SVG is in HTML5, and RDFa can work with HTML5, maybe I wouldn't mind giving old HTML a try again. Lord knows I'd like to user ampersands again. It sounds like your argument comes down to this: you have personally invested in RDFa, therefore having a competing technology is bad, regardless of the technical merits. I don't mean to parody here - I am somewhat sympathetic to this line of argument. Often pragmatic concerns mean that an incremental improvement just isn't worth the cost of switching (for example HTML vs. XHTML). My personally judgment is that we're not past the point of no return on data embedding. There's microformats, RDFa, and then dozens of other serializations of RDF (some of which you cited). This doesn't seem like a space on the verge of picking a single winner, and the players seem willing to experiment with different options. There are not dozens of other serializations of RDF. The point I was trying to make is, I'd rather put my time into something that exists now, than have to watch the wheel re-invented. I'd rather see semantic metadata become a reality. I'm glad that you personally feel that companies will be just peachy keen on having to support multiple parsers to get the same data. On the HTML WG side, I will never support microdata, because no case has been made for its existence. The point is, people in the real world have to use this stuff. It helps them if they have one, generally agreed on approach. As it is, folks have to contend with both RDFa and microformats, but at least we know these have different purposes. From my cursory study, I think microdata could subsume many of the use cases of both microformats and RDFa. It seems to me that it avoids much of what microformats advocates find objectionable, and provides a good basis for new microformats; but at the same time it seems it can represent a full RDF data model. Thus, I think we have the potential to get one solution that works for everyone. I'm not 100% sure microdata can really achieve this, but I think making the attempt is a positive step. It can't, don't you see? Microdata will only work in HTML5/XHTML5. XHTML 1.1 and yes, 2.0 will be around for years, decades. In addition, XHTML5 already supports RDFa. Supporting XHTML 1.1 has about 0.001% as much value as supporting text/html. XHTML 2.0 is completely irrelevant to the Web, and looks on track to remain so. So I don't find this point very persuasive. I don't think you'll find that the world is breathlessly waiting for HTML5. I think you'll find that XHTML 1.1 will have wider use than HTML5 for the next decade. If not longer. I wouldn't count out XHTML 2.0, either. And in a decade, a lot can change. Why you think something completely brand new, no vendor support, drummed up in a few hours or a day or so is more robust, and a better option than a mature spec in wide use, well frankly boggles my mind. I haven't evaluated it enough to know for sure (as I said). I do think avoiding CURIEs is extremely valuable from the point of view of sane text/html semantics and ease of authoring; and RDF experts seem to think it works fine for representing RDF data models. So tentatively, I don't see any gaping holes. If you see a technical problem, and not just potential competition for the technology you've invested in, then you should definitely cite it. I don't think CURIEs are that difficult, nor impossible no matter the arguments that Henri brings out. I am impressed with your belief in HTML5. But One other detail that it seems not many people have picked up on yet is that microdata proposes a DOM API to extract microdata-based info from a live document on the client side. In my opinion this is huge and has the potential to greatly increase author interest in semantic markup. Not really. Can do this now with RDFa in XHTML. And I don't need any new DOM to do it. The power of semantic markup isn't really seen until you take that markup data _outside_ the document. And merge that data with data from other documents. Google rich snippets. Yahoo searchmonkey. Heck, even an application that manages the data from different subsites of one domain. I respectfully disagree. An API to do things client-side that doesn't require an external library is extremely powerful, because it lets content authors easily make use of the very same semantic markup that they are vending for third parties, so they have more incentive to use it and get it right. Sure, we'll have to disagree on this one. Now, it may be that microdata will ultimately fail, either because it is outcompeted by RDFa,
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Thu, 14 May 2009 22:30:41 +0200, Shelley Powers shell...@burningbird.net wrote: I'm not 100% sure microdata can really achieve this, but I think making the attempt is a positive step. It can't, don't you see? Microdata will only work in HTML5/XHTML5. Actually, as specified, it would work for any text/html and any XHTML content. It would just be valid in (X)HTML5, but it would work even if the input is not valid (X)HTML5 or looks like HTML4 or XHTML 1.1. XHTML 1.1 and yes, 2.0 will be around for years, decades. In addition, XHTML5 already supports RDFa. XHTML5 supports RDFa to the same extent that XHTML 1.1 supports microdata (in both cases, it would work but is not valid). -- Simon Pieters Opera Software
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Wed, May 13, 2009 at 10:04 AM, Leif Halvard Silli l...@malform.no wrote: Toby Inkster on Wed May 13 02:19:17 PDT 2009: Leif Halvard Silli wrote: Hear hear. Lets call it Cascading RDF Sheets. http://buzzword.org.uk/2008/rdf-ease/spec http://buzzword.org.uk/2008/rdf-ease/reactions I have actually implemented it. It works. Oh! Thanks for sharing. Indeed, RDF-EASE seems fairly nice! RDFa is better though. What does 'better' mean in this context? Why and how? Because it is easier to process? But EASE seems more compatible with microformats, and is better in that sense. I'd also like clarification here. I dislike *all* of the inline metadata proposals to some degree, for the same reasons that I dislike inline @style and @onfoo handlers. A Selector-based way of applying semantics fits my theoretical needs much better. I read all the reactions you pointed to. Some made the claim that EASE would move semantics out of the HTML file, and that microformats was better as it keeps the semantics inside the file. But I of course agree with you that EASE just underline/outline the semantics already in the file. Yup. The appropriate critique of separated metadata is that the *data* is moved out of the document, where it will inevitably decay compared to the live document. RDF-EASE keeps all the data stored in the live document, and merely specifies how to extract it. The only way you can lose data then is by changing the html structure itself, which is much less common than just changing the content. From the EASE draft: All properties in RDF-EASE begin with the string -rdf-, as per §4.1.2.1 Vendor-specific extensions in [CSS21]. This allows RDF-EASE and CSS to be safely mixed in one file, [...] I wonder why you think it is so important to be able to mix CSS and EASE. It seems better to separate the two completely. I'm not thrilled with the mixture of CSS and metadata either. Just because it uses Selectors doesn't mean it needs to be specifiable alongside CSS. jQuery uses Selectors too, but it stays where it belongs. ^_^ (That being said, there's a plugin for it that allows you to specify js in your CSS, and it gets applied to the matching elements from the block's selector.) ~TJ
Re: [whatwg] Annotating structured data that HTML has no semantics for
jgra...@opera.com wrote: Quoting Philip Taylor excors+wha...@gmail.com: On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote: One of the more elaborate use cases I collected from the e-mails sent in over the past few months was the following: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. [...] To address this use case and its scenarios, I've added to HTML5 a simple syntax (three new attributes) based on RDFa. There's a quickly-hacked-together demo at http://philip.html5.org/demos/microdata/demo.html (works in at least Firefox and Opera), which attempts to show you the JSON serialisation of the embedded data, which might help in examining the proposal. I have a *totally unfinished* demo that does something rather similar at [1]. It is highly likely to break and/or give incorrect results**. If you use it for anything important you are insane :) I have now added extremely preliminary RDF support with output as N3 and RDF/XML courtesy of rdflib. It is certain to be buggy.
Re: [whatwg] Annotating structured data that HTML has no semantics for
James Graham wrote: jgra...@opera.com wrote: Quoting Philip Taylor excors+wha...@gmail.com: On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote: One of the more elaborate use cases I collected from the e-mails sent in over the past few months was the following: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. [...] To address this use case and its scenarios, I've added to HTML5 a simple syntax (three new attributes) based on RDFa. There's a quickly-hacked-together demo at http://philip.html5.org/demos/microdata/demo.html (works in at least Firefox and Opera), which attempts to show you the JSON serialisation of the embedded data, which might help in examining the proposal. I have a *totally unfinished* demo that does something rather similar at [1]. It is highly likely to break and/or give incorrect results**. If you use it for anything important you are insane :) I have now added extremely preliminary RDF support with output as N3 and RDF/XML courtesy of rdflib. It is certain to be buggy. So much concern about generating RDF, makes one wonder why we didn't just implement RDFa... Shelley
Re: [whatwg] Annotating structured data that HTML has no semantics for
On 14/5/09 14:18, Shelley Powers wrote: James Graham wrote: jgra...@opera.com wrote: Quoting Philip Taylor excors+wha...@gmail.com: On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote: One of the more elaborate use cases I collected from the e-mails sent in over the past few months was the following: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. [...] To address this use case and its scenarios, I've added to HTML5 a simple syntax (three new attributes) based on RDFa. There's a quickly-hacked-together demo at http://philip.html5.org/demos/microdata/demo.html (works in at least Firefox and Opera), which attempts to show you the JSON serialisation of the embedded data, which might help in examining the proposal. I have a *totally unfinished* demo that does something rather similar at [1]. It is highly likely to break and/or give incorrect results**. If you use it for anything important you are insane :) I have now added extremely preliminary RDF support with output as N3 and RDF/XML courtesy of rdflib. It is certain to be buggy. So much concern about generating RDF, makes one wonder why we didn't just implement RDFa... Having HTML5-microdata -to- RDF parsers is pretty critical to having test cases that help us all understand where RDFa-Classic and HTML5 diverge. I'm very happy to see this work being done and that there are multiple implementations. As far as I can see, the main point of divergence is around URI abbreviation mechanisms. But also HTML5 might not have a notion equivalent to RDF/RDFa's bNodes construct. The sooner we have these parsers the sooner we'll know for sure. Dan
Re: [whatwg] Annotating structured data that HTML has no semantics for
Dan Brickley wrote: On 14/5/09 14:18, Shelley Powers wrote: James Graham wrote: jgra...@opera.com wrote: Quoting Philip Taylor excors+wha...@gmail.com: On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote: One of the more elaborate use cases I collected from the e-mails sent in over the past few months was the following: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. [...] To address this use case and its scenarios, I've added to HTML5 a simple syntax (three new attributes) based on RDFa. There's a quickly-hacked-together demo at http://philip.html5.org/demos/microdata/demo.html (works in at least Firefox and Opera), which attempts to show you the JSON serialisation of the embedded data, which might help in examining the proposal. I have a *totally unfinished* demo that does something rather similar at [1]. It is highly likely to break and/or give incorrect results**. If you use it for anything important you are insane :) I have now added extremely preliminary RDF support with output as N3 and RDF/XML courtesy of rdflib. It is certain to be buggy. So much concern about generating RDF, makes one wonder why we didn't just implement RDFa... Having HTML5-microdata -to- RDF parsers is pretty critical to having test cases that help us all understand where RDFa-Classic and HTML5 diverge. I'm very happy to see this work being done and that there are multiple implementations. As far as I can see, the main point of divergence is around URI abbreviation mechanisms. But also HTML5 might not have a notion equivalent to RDF/RDFa's bNodes construct. The sooner we have these parsers the sooner we'll know for sure. Dan Actually, I believe there are other differences, as others have pointed out. http://www.jenitennison.com/blog/node/103 http://realtech.burningbird.net/semantic-web/semantic-web-issues-and-practices/holding-on-html5 Some of the differences have resulted in more modifications to the underlying HTML5 spec, which is curious, because Ian has stated in comments that support for RDF is only a side interest and not the main purpose behind the microdata section. With the statement that support for RDF isn't a particular goal of microdata, Dan, I think you're being optimistic about the good this effort will generate for RDFa. But, more power to you. Shelley
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Thu, May 14, 2009 at 1:25 PM, Dan Brickley dan...@danbri.org wrote: Having HTML5-microdata -to- RDF parsers is pretty critical to having test cases that help us all understand where RDFa-Classic and HTML5 diverge. I'm very happy to see this work being done and that there are multiple implementations. As far as I can see, the main point of divergence is around URI abbreviation mechanisms. But also HTML5 might not have a notion equivalent to RDF/RDFa's bNodes construct. The sooner we have these parsers the sooner we'll know for sure. If I understand RDF correctly, the idea is that everything can be URIs, subjects and objects can instead be blank nodes, and objects can instead be literals. If we restrict literals to strings (optionally with languages), then I think all triples must follow one of these eight patterns: urn:subject urn:predicate urn:object . urn:subject urn:predicate object . urn:subject urn:predicate object@lang . urn:subject urn:predicate _:X . _:X urn:predicate urn:object . _:X urn:predicate object . _:X urn:predicate object@lang . _:X urn:predicate _:Y . These cases can be trivially mapped into HTML5 microdata as: div item link itemprop=about href=urn:subject link itemprop=urn:predicate href=urn:object /div div item link itemprop=about href=urn:subject meta itemprop=urn:predicate content=object /div div item link itemprop=about href=urn:subject meta itemprop=urn:predicate content=object lang=lang /div div item link itemprop=about href=urn:subject meta itemprop=urn:predicate item id=X /div link subject=X itemprop=urn:predicate href=urn:object meta subject=X itemprop=urn:predicate content=object meta subject=X itemprop=urn:predicate content=object lang=lang meta subject=X itemprop=urn:predicate item id=Y (There's the caveat about link and meta being moved into head in some browsers; you can replace them with a and span instead.) These aren't the most elegant ways of expressing complex structures (because they don't make much use of nesting), but hopefully they demonstrate that it's possible to express any RDF graph (that only uses string literals) by decomposing into triples and then writing as HTML with these patterns. (If all the triples using a blank node have the same subject, then you don't need to use 'id' and 'subject' because you can just nest the markup instead, I think.) With my parser (in Firefox 3.0), the output triples (sorted into a clearer order) are: http://www.w3.org/1999/xhtml/vocab#item urn:subject . http://www.w3.org/1999/xhtml/vocab#item urn:subject . http://www.w3.org/1999/xhtml/vocab#item urn:subject . http://www.w3.org/1999/xhtml/vocab#item urn:subject . urn:subject urn:predicate urn:object . urn:subject urn:predicate object . urn:subject urn:predicate object@lang . urn:subject urn:predicate _:n0 . _:n0 urn:predicate urn:object . _:n0 urn:predicate object . _:n0 urn:predicate object@lang . _:n0 urn:predicate _:n1 . which corresponds to what was desired. So, I can't see any limits on expressivity other than that literals must be strings. (But I'm not at all an expert on RDF, and I may have missed something in the microdata spec, so please let me know if I'm wrong!) -- Philip Taylor exc...@gmail.com
Re: [whatwg] Annotating structured data that HTML has no semantics for
On May 14, 2009, at 5:18 AM, Shelley Powers wrote: So much concern about generating RDF, makes one wonder why we didn't just implement RDFa... If it's possible to produce RDF triples from microdata, and if RDF triples of interest can be expressed with microdata, why does it matter if the concrete syntax is the same as RDFa? Isn't the important thing about RDF the data model, not the surface syntax? (I understand that if the microdata syntax offered no advantages over RDFa, then it would be a wasted effort to diverge. But my impression is that you'd object to anything that isn't exactly identical to RDFa, even if it can easily be used in the same way.) Regards, Maciej
Re: [whatwg] Annotating structured data that HTML has no semantics for
Maciej Stachowiak wrote: On May 14, 2009, at 5:18 AM, Shelley Powers wrote: So much concern about generating RDF, makes one wonder why we didn't just implement RDFa... If it's possible to produce RDF triples from microdata, and if RDF triples of interest can be expressed with microdata, why does it matter if the concrete syntax is the same as RDFa? Isn't the important thing about RDF the data model, not the surface syntax? (I understand that if the microdata syntax offered no advantages over RDFa, then it would be a wasted effort to diverge. But my impression is that you'd object to anything that isn't exactly identical to RDFa, even if it can easily be used in the same way.) Regards, Maciej Because one would assume that one way to accomplish a task would be more attractive to web developers, designers, parser developers, browsers, et al. In addition, one would also assume that one way to accomplish a task would be more attractive in regards to testing, maintaining and moving on in the future. Notice how there is only VHS and not Betamax? Notice the same about Blu-Ray and HD-TV? People won't buy into something while there are competitive specs, and these are competitive in that it makes little since to use both in a document, though you can now. The point is, people in the real world have to use this stuff. It helps them if they have one, generally agreed on approach. As it is, folks have to contend with both RDFa and microformats, but at least we know these have different purposes. Shelley
Re: [whatwg] Annotating structured data that HTML has no semantics for
On May 14, 2009, at 1:04 PM, Shelley Powers wrote: Maciej Stachowiak wrote: On May 14, 2009, at 5:18 AM, Shelley Powers wrote: So much concern about generating RDF, makes one wonder why we didn't just implement RDFa... If it's possible to produce RDF triples from microdata, and if RDF triples of interest can be expressed with microdata, why does it matter if the concrete syntax is the same as RDFa? Isn't the important thing about RDF the data model, not the surface syntax? (I understand that if the microdata syntax offered no advantages over RDFa, then it would be a wasted effort to diverge. But my impression is that you'd object to anything that isn't exactly identical to RDFa, even if it can easily be used in the same way.) Regards, Maciej Because one would assume that one way to accomplish a task would be more attractive to web developers, designers, parser developers, browsers, et al. In addition, one would also assume that one way to accomplish a task would be more attractive in regards to testing, maintaining and moving on in the future. Notice how there is only VHS and not Betamax? Notice the same about Blu-Ray and HD-TV? People won't buy into something while there are competitive specs, and these are competitive in that it makes little since to use both in a document, though you can now. Physical media do tend to converge due to network effects. I think the effect is less strong for digital file formats. For example, MP3 and AAC are both fairly successful; similarly, MPEG-4, Windows Media and Ogg are all getting some degree of traction. But you may be right that ultimately there will be only one winner. The point is, people in the real world have to use this stuff. It helps them if they have one, generally agreed on approach. As it is, folks have to contend with both RDFa and microformats, but at least we know these have different purposes. From my cursory study, I think microdata could subsume many of the use cases of both microformats and RDFa. It seems to me that it avoids much of what microformats advocates find objectionable, and provides a good basis for new microformats; but at the same time it seems it can represent a full RDF data model. Thus, I think we have the potential to get one solution that works for everyone. I'm not 100% sure microdata can really achieve this, but I think making the attempt is a positive step. One other detail that it seems not many people have picked up on yet is that microdata proposes a DOM API to extract microdata-based info from a live document on the client side. In my opinion this is huge and has the potential to greatly increase author interest in semantic markup. Now, it may be that microdata will ultimately fail, either because it is outcompeted by RDFa, or because not enough people care about semantic markup, or whatever. But at least for now, I don't see a reason to strangle it in the cradle. Regards, Maciej
Re: [whatwg] Annotating structured data that HTML has no semantics for
Maciej Stachowiak wrote: On May 14, 2009, at 1:04 PM, Shelley Powers wrote: Maciej Stachowiak wrote: On May 14, 2009, at 5:18 AM, Shelley Powers wrote: So much concern about generating RDF, makes one wonder why we didn't just implement RDFa... If it's possible to produce RDF triples from microdata, and if RDF triples of interest can be expressed with microdata, why does it matter if the concrete syntax is the same as RDFa? Isn't the important thing about RDF the data model, not the surface syntax? (I understand that if the microdata syntax offered no advantages over RDFa, then it would be a wasted effort to diverge. But my impression is that you'd object to anything that isn't exactly identical to RDFa, even if it can easily be used in the same way.) Regards, Maciej Because one would assume that one way to accomplish a task would be more attractive to web developers, designers, parser developers, browsers, et al. In addition, one would also assume that one way to accomplish a task would be more attractive in regards to testing, maintaining and moving on in the future. Notice how there is only VHS and not Betamax? Notice the same about Blu-Ray and HD-TV? People won't buy into something while there are competitive specs, and these are competitive in that it makes little since to use both in a document, though you can now. Physical media do tend to converge due to network effects. I think the effect is less strong for digital file formats. For example, MP3 and AAC are both fairly successful; similarly, MPEG-4, Windows Media and Ogg are all getting some degree of traction. But you may be right that ultimately there will be only one winner. Now, that's the problem with all of this effort...winners and losers. I don't support a spec because it gives me grins and giggles. I have certain tasks I want to do, and I look for what is the technology that has the most support in order to do them. I've long been an adherent to RDF, which isn't really up for debate. Originally, I was an RDF/XML person, until the RDF-in-XHTML folks changed my mind. What I see of RDFa is a specification that has been through a very long period of time, testing, commenting, being implemented by major players. I also have tools, right now, that I can use to process the RDFa, as well as support by two major search engine companies. As Dan pointed out earlier, microdata seems to support most of RDF. Well, I know that RDFa does. It makes little sense to me to start from scratch when a mature specification with multi-vendor support already exists. Especially when Drupal 7 rolls out with RDFa baked in. That's 1.7 million sites supporting the spec. Then there's the new Google snippet thing -- who knows how many additional sites we'll now find supporting RDFa. So, if I'm pushing for RDFa, it's not because I want to win. It's because I have things I want to do now, and I would like to make sure have a reasonable chance of working a couple of years in the future. And yeah, once SVG is in HTML5, and RDFa can work with HTML5, maybe I wouldn't mind giving old HTML a try again. Lord knows I'd like to user ampersands again. The point is, people in the real world have to use this stuff. It helps them if they have one, generally agreed on approach. As it is, folks have to contend with both RDFa and microformats, but at least we know these have different purposes. From my cursory study, I think microdata could subsume many of the use cases of both microformats and RDFa. It seems to me that it avoids much of what microformats advocates find objectionable, and provides a good basis for new microformats; but at the same time it seems it can represent a full RDF data model. Thus, I think we have the potential to get one solution that works for everyone. I'm not 100% sure microdata can really achieve this, but I think making the attempt is a positive step. It can't, don't you see? Microdata will only work in HTML5/XHTML5. XHTML 1.1 and yes, 2.0 will be around for years, decades. In addition, XHTML5 already supports RDFa. Why you think something completely brand new, no vendor support, drummed up in a few hours or a day or so is more robust, and a better option than a mature spec in wide use, well frankly boggles my mind. I am impressed with your belief in HTML5. But One other detail that it seems not many people have picked up on yet is that microdata proposes a DOM API to extract microdata-based info from a live document on the client side. In my opinion this is huge and has the potential to greatly increase author interest in semantic markup. Not really. Can do this now with RDFa in XHTML. And I don't need any new DOM to do it. The power of semantic markup isn't really seen until you take that markup data _outside_ the document. And merge that data with data from other documents. Google rich snippets. Yahoo searchmonkey. Heck, even an application
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Thu, May 14, 2009 at 3:54 PM, Philip Taylor excors+wha...@gmail.com wrote: [...] If we restrict literals to strings [...] But *why* restrict literals to strings?? Being unable to state that 2009-05-14 is a date makes that value completely useless: it would only be useful on contexts where a date is expected (bascially, because it is a date), but it can't be used on such contexts because the tool retrieving the value has no hint about it being a date. Same is true for integers, prices (a.k.a. decimals plus a currency symbol), geographic coordinates, iguana descriptions, and so on. On Thu, May 14, 2009 at 8:25 PM, Maciej Stachowiak m...@apple.com wrote: On May 14, 2009, at 5:18 AM, Shelley Powers wrote: So much concern about generating RDF, makes one wonder why we didn't just implement RDFa... If it's possible to produce RDF triples from microdata, and if RDF triples of interest can be expressed with microdata, why does it matter if the concrete syntax is the same as RDFa? Isn't the important thing about RDF the data model, not the surface syntax? It doesn't matter one syntax or another. But if a syntax already exists (RDFa), building a new syntax should be properly justified. As of now, the only supposed benefit I have heard of for this syntax is that it avoids CURIEs... yet it replaces them with reversed domains?? Is that a benefit? I have been a Java programmer for some years, and still find that convention absurd, horrible, and annoying. I'll agree that CURIEs are ugly, and maybe hard to understand, but reversed domains are equally ugly and hard to understand. (I understand that if the microdata syntax offered no advantages over RDFa, then it would be a wasted effort to diverge. Which are the advantages it offers? I asked about them yesterday, and no one has answered, so I'm asking again: please, enlighten me on this because if I see no advantages myself and nobody else tells me about any advantage, then the only conclusion a rational mind can take is that there are no advantages. So, that's the position I'm on. I can easily change my mind if anyone points out some advantage that might actually help me more than RDFa when trying to add semantics and metadata to my pages. But my impression is that you'd object to anything that isn't exactly identical to RDFa, even if it can easily be used in the same way.) Actually, I do object to RDFa itself. Since the very first moment I saw discussions about it on these lists, I have been trying to highlight its flaws and to suggest ideas for alternatives. Now, would you really expect me not to object to what, at least from my current PoV, is simply worse than RDFa? IMHO, RDFa is just *passable*, and microdata is too *mediocre* to get a pass. I don't know about any solution that would be perfect, but I really think that this community is definitely capable of producing something that is, at least, *good*. Of course, these are just my opinions, but I have told also what they are based in. I'm eager to change my mind of there is base for it. Regards, Eduard Pascual
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Thu, May 14, 2009 at 5:00 PM, Jonas Sicking jo...@sicking.cc wrote: * Support for specifying a machine-readable value, such as for dates, colors, numbers, etc. * Support for tabular data. Especially the former is very interesting to me. I even wonder it would allow replacing the time element with a standardized microformat, such as: Christmas is going down on span item=w3c.time itemvalue=12-25-2009The 25th day of Decemberspan! (Though I'd probably avoid prefixes for 'standardized' item names). Hmm.. I guess the syntax would be span item itemprop=w3c.time propvalue=12-25-2009 Not very nice I admit. / Jonas
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Thu, May 14, 2009 at 2:54 PM, Philip Taylor excors+wha...@gmail.com wrote: [...] urn:subject urn:predicate _:X . [...] div item link itemprop=about href=urn:subject meta itemprop=urn:predicate item id=X /div [...] So, I can't see any limits on expressivity other than that literals must be strings. Hmm, I think I'm wrong here. 'id' has to be unique, which means this pattern won't work if _:X is the object for triples with two different subjects. Additionally, there must be a chain from every blank node back to via http://www.w3.org/1999/xhtml/vocab#item, else it won't get serialised (since serialisation starts from top-level items and recurses down the correspondence chains). As a consequence of this and the previous point, it is impossible to express cycles (e.g. _:X urn:predicate _:X, or any longer cycles) unless the cycle contains . So there are these two restrictions on the shapes of expressible RDF graphs. (I can't think of any other restrictions, though...) -- Philip Taylor exc...@gmail.com
Re: [whatwg] Annotating structured data that HTML has no semantics for
On May 14, 2009, at 1:30 PM, Shelley Powers wrote: So, if I'm pushing for RDFa, it's not because I want to win. It's because I have things I want to do now, and I would like to make sure have a reasonable chance of working a couple of years in the future. And yeah, once SVG is in HTML5, and RDFa can work with HTML5, maybe I wouldn't mind giving old HTML a try again. Lord knows I'd like to user ampersands again. It sounds like your argument comes down to this: you have personally invested in RDFa, therefore having a competing technology is bad, regardless of the technical merits. I don't mean to parody here - I am somewhat sympathetic to this line of argument. Often pragmatic concerns mean that an incremental improvement just isn't worth the cost of switching (for example HTML vs. XHTML). My personally judgment is that we're not past the point of no return on data embedding. There's microformats, RDFa, and then dozens of other serializations of RDF (some of which you cited). This doesn't seem like a space on the verge of picking a single winner, and the players seem willing to experiment with different options. The point is, people in the real world have to use this stuff. It helps them if they have one, generally agreed on approach. As it is, folks have to contend with both RDFa and microformats, but at least we know these have different purposes. From my cursory study, I think microdata could subsume many of the use cases of both microformats and RDFa. It seems to me that it avoids much of what microformats advocates find objectionable, and provides a good basis for new microformats; but at the same time it seems it can represent a full RDF data model. Thus, I think we have the potential to get one solution that works for everyone. I'm not 100% sure microdata can really achieve this, but I think making the attempt is a positive step. It can't, don't you see? Microdata will only work in HTML5/XHTML5. XHTML 1.1 and yes, 2.0 will be around for years, decades. In addition, XHTML5 already supports RDFa. Supporting XHTML 1.1 has about 0.001% as much value as supporting text/html. XHTML 2.0 is completely irrelevant to the Web, and looks on track to remain so. So I don't find this point very persuasive. Why you think something completely brand new, no vendor support, drummed up in a few hours or a day or so is more robust, and a better option than a mature spec in wide use, well frankly boggles my mind. I haven't evaluated it enough to know for sure (as I said). I do think avoiding CURIEs is extremely valuable from the point of view of sane text/html semantics and ease of authoring; and RDF experts seem to think it works fine for representing RDF data models. So tentatively, I don't see any gaping holes. If you see a technical problem, and not just potential competition for the technology you've invested in, then you should definitely cite it. I am impressed with your belief in HTML5. But One other detail that it seems not many people have picked up on yet is that microdata proposes a DOM API to extract microdata-based info from a live document on the client side. In my opinion this is huge and has the potential to greatly increase author interest in semantic markup. Not really. Can do this now with RDFa in XHTML. And I don't need any new DOM to do it. The power of semantic markup isn't really seen until you take that markup data _outside_ the document. And merge that data with data from other documents. Google rich snippets. Yahoo searchmonkey. Heck, even an application that manages the data from different subsites of one domain. I respectfully disagree. An API to do things client-side that doesn't require an external library is extremely powerful, because it lets content authors easily make use of the very same semantic markup that they are vending for third parties, so they have more incentive to use it and get it right. Now, it may be that microdata will ultimately fail, either because it is outcompeted by RDFa, or because not enough people care about semantic markup, or whatever. But at least for now, I don't see a reason to strangle it in the cradle. Outcompeted...wow, what a way to think of it. Sorry, but competition has no place in spec work. With due respect, you're the one who brought competition into this discussion by saying there can only be one winner. I don't really think that's true, in this case. Regards, Maciej
Re: [whatwg] Annotating structured data that HTML has no semantics for
Leif Halvard Silli wrote: Hear hear. Lets call it Cascading RDF Sheets. http://buzzword.org.uk/2008/rdf-ease/spec http://buzzword.org.uk/2008/rdf-ease/reactions I have actually implemented it. It works. RDFa is better though. -Toby
[whatwg] Annotating structured data that HTML has no semantics for
Toby Inkster on Wed May 13 02:19:17 PDT 2009: Leif Halvard Silli wrote: Hear hear. Lets call it Cascading RDF Sheets. http://buzzword.org.uk/2008/rdf-ease/spec http://buzzword.org.uk/2008/rdf-ease/reactions I have actually implemented it. It works. Oh! Thanks for sharing. RDFa is better though. What does 'better' mean in this context? Why and how? Because it is easier to process? But EASE seems more compatible with microformats, and is better in that sense. I read all the reactions you pointed to. Some made the claim that EASE would move semantics out of the HTML file, and that microformats was better as it keeps the semantics inside the file. But I of course agree with you that EASE just underline/outline the semantics already in the file. The thing that probably is most different from (most) microformats (and RDFa?) is that EASE can apply semantics even to bare naked elements without any @class, @id or other attributes. However, EASE do not /require/ one to use it like that. One may choose to create an entirely class based EASE document. It would even be possible to use EASE together with Ian's microdata, don't you think? From the EASE draft: All properties in RDF-EASE begin with the string -rdf-, as per §4.1.2.1 Vendor-specific extensions in [CSS21]. This allows RDF-EASE and CSS to be safely mixed in one file, [...] I wonder why you think it is so important to be able to mix CSS and EASE. It seems better to separate the two completely. From the EASE draft: The algorithm assumes that the document is held in a DOM-compatible representation, Side kick: meta is proposed as part of microdata. But both Firefox and Safari will in the DOM render meta as part of head, regardless. -- leif halvard silli
Re: [whatwg] Annotating structured data that HTML has no semantics for
In terms of prefixes, I find that 'com.foaf-project.name' is a lot more difficult to write than 'foaf:name'. Reverse domain names are non-intuitive for non-programmer types (or non-Java programmers). If we can come up with a way of using the string foaf:name without having to declare foaf in each document, I'm totally in agreement. I've considered maybe registering the foaf URL scheme, or using some other punctuation character and having people register prefixes, but I don't know what punctuation character to use (':' and '.' are both taken). put in HTML5 some predefinited prefixes for @itemprop: dc = http://purl.org/dc/terms/ foaf = http://xmlns.com/foaf/0.1/ vcard = http://www.w3.org/2001/vcard-rdf/3.0# owl = http://www.w3.org/2002/07/owl# rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs = http://www.w3.org/2000/01/rdf-schema# sioc = http://rdfs.org/sioc/ns# skos = http://www.w3.org/2004/02/skos/core# xsd = http://www.w3.org/2001/XMLSchema# also, instead of @item @itemprop @subject is better @item @prop @subj or @rdf-typeof @rdf-property @rdf-about (and @rdf-rel) -- Giovanni Gentili
Re: [whatwg] Annotating structured data that HTML has no semantics for
Let me start with some apologies: On Tue, May 12, 2009 at 12:55 PM, Eduard Pascual herenva...@gmail.com wrote: [...] Seeing that solutions are already being discussed here, I'm trying to put the ideas into a human-readable document that I plan to submit to this list either late today or early tomorrow for your review and consideration. Oops, I'm already late with that. I had some unexpected compromises and had no time to finish that doc. I still hope, however, to publish it today. On Tue, May 12, 2009 at 12:55 PM, Eduard Pascual herenva...@gmail.com wrote: [...] Third issue: also a flaw inherited from RDFa, it can be summarized as completelly ignoring the requirement I submitted to this list on April 28th, in reply to Ian asking us to review the use cases [1]. [...] [1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019487.html On Tue, May 12, 2009 at 7:30 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: Well, he didn't quite *ignore* it - he did explicitly call out that requirement to say that his solution didn't solve it at all. I missed that part of Ian's post, sorry. I really read it from top to bottom, but it was quite long. I guess I should have re-read it. Now, after some re-reading, I have noticed a point I should reply to: On Sun, May 10, 2009 at 12:32 PM, Ian Hickson i...@hixie.ch wrote: [...] * Any additional markup or data used to allow the machine to understand the actual information shouldn't be redundantly repeated (e.g. on each cell of a table, when setting it on the column is possible). This isn't met at all with the current proposal. Unfortunately the only general solutions I could come up with that would allow this were selector-based, and in practice authors are still having trouble understanding how to use Selectors even with CSS. First, I'd like to ask for a clarification from Ian: what do you mean by autrhos are still having trouble understanding how to use Selectors? If you mean that they have trouble when trying to select something like the second cell of the first row that has a 'foo' attribute different from 'bar' within tables that have four or more rows or even more obscure stuff, then I should agree: most authors will definitely have trouble dealing with so complex cases, and I bet many will always have such trouble. However, if you mean that authors can't deal with simple class, id, and/or children/descendant selectors, then I think you are seriously understimating authors. On a side note, I'd like to advance that my idea, despite being Selector-based (actually, I should say CSS-based: it reuses quite more than selectors), wouldn't require authors to use selectors at all, at least for the cases that can currently be solved by RDFa (or, FWIW, with the current Microdata approach on the spec); the same way a page can be completely styled with CSS without using selectors, via the style attribute. On Tue, May 12, 2009 at 1:59 PM, Philip Taylor excors+wha...@gmail.com wrote: On Tue, May 12, 2009 at 11:55 AM, Eduard Pascual herenva...@gmail.com wrote: [...] (at least for now: many RDFa-aware agents vs. zero HTML5's microdata -aware agents) HTML5 microdata parsers seem pretty trivial to write - http://philip.html5.org/demos/microdata/demo.html is only about two hundred lines to read all the data and to produce JSON and N3-serialised RDF. It shouldn't take more than a few hours to produce a similar library for other languages, including the time taken to read the spec, so the implementation cost for generic parser libraries doesn't seem like a significant problem. Actually, I was thinking about the cost of deploying implementations, rather than writting them, since RDFa consumers are already out there and working. This, however, strays a bit out of the original idea: it's not really a matter of how big the cost is on its own, but of what do you get for that cost. This is probably my own fault, but I still fail to see what Ian's suggestion offers that RDFa doesn't; so my impression is that these costs, even if they are small, are buying nothing, so they are not worth it. If someone is willing to highlight what makes this proposal worth the costs (ie: what makes it better than RDFa), I'm willing to listen. On Tue, May 12, 2009 at 2:30 PM, Shelley Powers shell...@burningbird.net wrote: [...] Eduard, looking forward to seeing your own interpretation of the best metadata annotation. Hey, who said my proposal will be, or try to be, the best one? Definitelly, I didn't. Actually, the reason to submit it here will be to have other people look at it and figure out ways to improve it (and I'm quite sure it can be improved, I'm human after all). Please, let me explicitly state that I don't pretend that idea to be the best solution. Since neither RDFa, nor Microformats, nor Ian's proposal could solve my needs, my goal was to build a solution that solves both my needs, and those solved by other approaches, as a proof that
Re: [whatwg] Annotating structured data that HTML has no semantics for
I don't really like to be harsh, but I have some criticism to this, and it's going to be quite hard. However, my goal by pointing out what I consider so big mistakes is to help HTML5 becoming as good as it could be. First issue: it solves a (major) subset of what RDFa would solve. However, it has been taken as a requirement to avoid clashes/incompatibilities with RDFa. In other words, as things stand, authors will face two options: either use RDFa in HTML5, which would forsake validation but actually work; or take a less powerful, less supported (at least for now: many RDFa-aware agents vs. zero HTML5's microdata -aware agents) that validates but provides no pragmatic advantages. IMO, an approach that forces authors to choose between validity/conformance which doesn't *yet* works vs. invalid solutions that actually work is a horrible idea: it encourages authors to forsake validity if they want things to work. Wouldn't the RDFa + @prefix solution suggested many times work better and require less effort (for spec writters, for implementors, and for content authors)? Keep in mind that I don't think RDFa + @prefix is the solution we need; I'm just trying to point out that the current approach is even worse than that. Second issue: as the decaffeinated RDFa it is, the HTML5 Microdata approach tends to fail where RDFa itself fails. It's nice that, thanks to the time element, the problem with trying to reuse human-readable dates as machine-readable is dodged; but there are other cases where separate values might be needed: for example using a street address for the human-readable representation of a location and the exact geographic coordinates as the machine-readable (since not all micro-data parsers can rely on Google Maps's database to resolve street addresses, you know); or using a colored name (such as lime green displayed on lime green color) as the human-readable representation of a color, and the hexcode (like #00FF00) as the machine-readable representation. These are just the cases from the top of my head, and this can't be considered in any way a complete list. While *favoring* the reuse of human-readable values for the machine-readable ones is appropiate, because it's the widely most common case, *forcing* that reuse is a quite bad idea, because it is *not* the *only* case. Third issue: also a flaw inherited from RDFa, it can be summarized as completelly ignoring the requirement I submitted to this list on April 28th, in reply to Ian asking us to review the use cases [1]. I'll try to illustrate it with a example, inspired by the original use-case: Let's say someone's marking up a collection of iguanas (or cats, or even CDs, doesn't really make a difference when illustrating this issue), making a page for each iguana (or whatever) with all the details for it; and then making an index page listing the maybe 20 iguanas with their name, picture, and link to the corresponding page. Adding micro-data to that index, either with RDFa or with Ian's microdata proposal, would involve stating 20 times in the markup something like this is the iguana's picture; this is the iguana's name; and this is the iguana's URL. It would be preferable to be able to state something like each (row) tr in the table describes an iguana: the imgs are each iguana's picture, the contents of the a's are the names, and the @href of the a's are the URLs to their main pages just once. If I only need to state the table headings once for the users to understand this concept, why should a micro-data consumer require me to state it 20 times, once for each row? Please note how such a page would be quite painful to maintain: any mistake in the micro-data mark-up would generate invalid data and require a manual harvest of the data on the page, thus killing the whole purpose of micro-data. And repeating something 20 (or more) times brings a lot of chances to put a typo in, or to miss an attribute, or any minor but devastating mistake like these. Last, but not least, I'm not sure if it was wise to start defining a solution while some of the requirements seem to be still under discussion. Actually, I had a possible solution in mind, but I was holding it while reviewing it against the requiremetns being discussed, so I could adapt it to any requirements I might had initially missed. Seeing that solutions are already being discussed here, I'm trying to put the ideas into a human-readable document that I plan to submit to this list either late today or early tomorrow for your review and consideration. Regards, Eduard Pascual [1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019487.html
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Tue, May 12, 2009 at 11:55 AM, Eduard Pascual herenva...@gmail.com wrote: [...] (at least for now: many RDFa-aware agents vs. zero HTML5's microdata -aware agents) HTML5 microdata parsers seem pretty trivial to write - http://philip.html5.org/demos/microdata/demo.html is only about two hundred lines to read all the data and to produce JSON and N3-serialised RDF. It shouldn't take more than a few hours to produce a similar library for other languages, including the time taken to read the spec, so the implementation cost for generic parser libraries doesn't seem like a significant problem. The cost of integration with backend RDF-based systems seems more significant - hopefully you could simply replace the frontend RDFa parser with a microdata parser and generate the same RDF triples and it would all work fine, but I don't know whether that's true in practice (because maybe the microdata syntax is too restrictive to represent the vocabularies people want to use, and so they'd have to go to lots of extra effort to create a new vocabulary). [...] there are other cases where separate values might be needed: for example using a street address for the human-readable representation of a location and the exact geographic coordinates as the machine-readable (since not all micro-data parsers can rely on Google Maps's database to resolve street addresses, you know); or using a colored name (such as lime green displayed on lime green color) as the human-readable representation of a color, and the hexcode (like #00FF00) as the machine-readable representation. You could replace span itemprop=colorlime green/span span itemprop=location1 High Street/span with meta itemprop=color content=#00FF00spanlime green/span meta itemprop=location.lat content=56.78meta itemprop=location.long content=-12.34span1 High Street/span to get the desired output. (Not particularly elegant syntax, though.) -- Philip Taylor exc...@gmail.com
Re: [whatwg] Annotating structured data that HTML has no semantics for
Philip Taylor wrote: On Tue, May 12, 2009 at 11:55 AM, Eduard Pascual herenva...@gmail.com wrote: [...] (at least for now: many RDFa-aware agents vs. zero HTML5's microdata -aware agents) HTML5 microdata parsers seem pretty trivial to write - http://philip.html5.org/demos/microdata/demo.html is only about two hundred lines to read all the data and to produce JSON and N3-serialised RDF. It shouldn't take more than a few hours to produce a similar library for other languages, including the time taken to read the spec, so the implementation cost for generic parser libraries doesn't seem like a significant problem. Writing something that will produce triples may be easy, but what's important is that you're producing an RDF model. Philip, I've been looking at your application, and you're not producing the same model for Ian's microdata proposal that is produced using either eRDF or RDFa. I'll have more on this later. The cost of integration with backend RDF-based systems seems more significant - hopefully you could simply replace the frontend RDFa parser with a microdata parser and generate the same RDF triples and it would all work fine, but I don't know whether that's true in practice (because maybe the microdata syntax is too restrictive to represent the vocabularies people want to use, and so they'd have to go to lots of extra effort to create a new vocabulary). [...] there are other cases where separate values might be needed: for example using a street address for the human-readable representation of a location and the exact geographic coordinates as the machine-readable (since not all micro-data parsers can rely on Google Maps's database to resolve street addresses, you know); or using a colored name (such as lime green displayed on lime green color) as the human-readable representation of a color, and the hexcode (like #00FF00) as the machine-readable representation. You could replace span itemprop=colorlime green/span span itemprop=location1 High Street/span with meta itemprop=color content=#00FF00spanlime green/span meta itemprop=location.lat content=56.78meta itemprop=location.long content=-12.34span1 High Street/span to get the desired output. (Not particularly elegant syntax, though.) It's funny, but oddly enough, this discussion reminds me of when I started at Boeing, right after college. I started just when the great debate between SQL and QUEL was ending, in SQL's favor. Most folks still feel that QUEL was the superior option, but SQL won out in the end because it had widespread use, and was supported by more of the (powerful) database companies, and hence the companies using the databases. The same could be said of Betamax versus VHS, and even the recent HDTV and Blu-Ray debates: we can get caught up in issues of superiority and argue the fine points of (mostly) obscure markup until the cows come home, but at some point in time, you have to pick a standard to get behind, or no one will any confidence in _any_ of the options being proposed--and the concept underlying the competing technologies (or standards) is hindered, perhaps for years. Sorry, I digress. Eduard, looking forward to seeing your own interpretation of the best metadata annotation. Shelley
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Tue, 12 May 2009, Peter Mika wrote: Just a quick comment on: it uses prefixes, which most authors simply do not understand, and which many implementors end up getting wrong (e.g. SearchMonkey hard-coded certain prefixes in its first implementation, Google's handling of RDF blocks for license declarations is all done with Actually, the problem we see is not so much the prefixes themselves but rather the cumbersome way of specifying namespace prefix definitions using xmlns. So I think it would make sense to have some mechanism for referencing bundles of namespace prefixes ('profiles') or namespace registries, in order to easy authoring. In terms of prefixes, I find that 'com.foaf-project.name' is a lot more difficult to write than 'foaf:name'. Reverse domain names are non-intuitive for non-programmer types (or non-Java programmers). If we can come up with a way of using the string foaf:name without having to declare foaf in each document, I'm totally in agreement. I've considered maybe registering the foaf URL scheme, or using some other punctuation character and having people register prefixes, but I don't know what punctuation character to use (':' and '.' are both taken). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Annotating structured data that HTML has no semantics for
Ian Hickson wrote: On Tue, 12 May 2009, Peter Mika wrote: Just a quick comment on: it uses prefixes, which most authors simply do not understand, and which many implementors end up getting wrong (e.g. SearchMonkey hard-coded certain prefixes in its first implementation, Google's handling of RDF blocks for license declarations is all done with Actually, the problem we see is not so much the prefixes themselves but rather the cumbersome way of specifying namespace prefix definitions using xmlns. So I think it would make sense to have some mechanism for referencing bundles of namespace prefixes ('profiles') or namespace registries, in order to easy authoring. In terms of prefixes, I find that 'com.foaf-project.name' is a lot more difficult to write than 'foaf:name'. Reverse domain names are non-intuitive for non-programmer types (or non-Java programmers). If we can come up with a way of using the string foaf:name without having to declare foaf in each document, I'm totally in agreement. I've considered maybe registering the foaf URL scheme, or using some other punctuation character and having people register prefixes, but I don't know what punctuation character to use (':' and '.' are both taken). But then we would lose the extensibility, which is the power behind all of this. If I remember correctly, Henri had an issue with the DOM when it came to support of namespaces in XHTML, and not in HTML, which was the reason that @prefix or something along those lines proposed. There was quite positive progress in this regard, too. I don't know what happened to that progress. But regardless, the majority of people will include metadata markup by installing a plug-in or module, and making a couple of choices. And if you put together a good ten-minute tutorial for the average developer, they'll have no problem with foaf:name. Training and clarity of communication is much ore important than form, it always has been with technology. The examples you come up with just don't justify discarding consideration of a capability that just started getting incorporated into Google search. I would say if your fellow Google developers could understand how this all works, there is hope for others. Shelley
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Tue, May 12, 2009 at 4:34 PM, Shelley Powers shell...@burningbird.net wrote: I would say if your fellow Google developers could understand how this all works, there is hope for others. if http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html Shelley - Sam Ruby
Re: [whatwg] Annotating structured data that HTML has no semantics for
Sam Ruby wrote: On Tue, May 12, 2009 at 4:34 PM, Shelley Powers shell...@burningbird.net wrote: I would say if your fellow Google developers could understand how this all works, there is hope for others. if http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html \ - Sam Ruby Ah heck, I've made mistakes with vocabularies too. That's why you ask for feedback. Unfortunately, asking for feedback isn't an option when you're creating secret stuff. I could have wished Google used FOAF or DC, too, but it's a start. Shelley
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Tue, May 12, 2009 at 10:21 PM, Sam Ruby ru...@intertwingly.net wrote: On Tue, May 12, 2009 at 4:34 PM, Shelley Powers shell...@burningbird.net wrote: I would say if your fellow Google developers could understand how this all works, there is hope for others. if http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html Also: The instructions at http://google.com/support/webmasters/bin/answer.py?answer=146898 (and related pages) alternate between xmlns:v=http://rdf.data-vocabulary.org; and xmlns:v=http://rdf.data-vocabulary.org/; seemingly at random. (The first means that property=v:name abbreviates the bogus URI http://rdf.data-vocabulary.orgname;, if I understand correctly. The second means it's http://rdf.data-vocabulary.org/name; which is a 404. Perhaps they meant xmlns:v=http://rdf.data-vocabulary.org/#; which would point at the relevant bit of the vocabulary RDF file? Hopefully people won't actually deploy content using the inconsistent namespaces before the documentation is fixed...) (They've also got a spanstrong property=v:name and spanspan property=v:locality and some unclosed as, so it seems the documentation writers are having difficulty even writing plain HTML.) -- Philip Taylor exc...@gmail.com
Re: [whatwg] Annotating structured data that HTML has no semantics for
Tab Atkins Jr. on Tue, 12 May 2009 12:30:27 -0500: On Tue, May 12, 2009 at 5:55 AM, Eduard Pascual: [...] It would be preferable to be able to state something like each (row) tr in the table describes an iguana: the imgs are each iguana's picture, the contents of the a's are the names, and the @href of the a's are the URLs to their main pages just once. Indeed. If I only need to state the table headings once for the users to understand this concept, why should a micro-data consumer require me to state it 20 times, once for each row? Please note how such a page would be quite painful to maintain: any mistake in the micro-data mark-up would generate invalid data and require a manual harvest of the data on the page, thus killing the whole purpose of micro-data. Indeed. (But of course, for copy-paste safety, the format has to be wordy and repetitive.) And repeating something 20 (or more) times brings a lot of chances to put a typo in, or to miss an attribute, or any minor but devastating mistake like these. Well, he didn't quite *ignore* it - he did explicitly call out that requirement to say that his solution didn't solve it at all. He also laid down the reason why - it's unlikely that any reasonable simple in-place metadata solution would allow you to do that. You either need significant complexity, some reliance on language semantics (like tables can rely on their headers), or moving to out-of-band specification, likely through a Selectors-based model. Indeed. And Ian's arguments against a selector based model (the claim that authors have problems understanding selectors) was one of the least convincing arguments he made, I think. CSS and selectors appears to be one of the best understood technologies of the web. The last is likely the best solution for that, and is even easier to implement within Ian' simplified proposal. I don't see a good reason why that can't advance on a separate track, as (being out-of-band) it doesn't require changes to HTML to be usable. I floated a basic proposal for Cascading RDF[1] several months ago, and someone else (I think Eduard? I'd have to check my archives) did something very similar. [1]: http://www.xanthir.com/rdfa-vs-crdf.php Hear hear. Lets call it Cascading RDF Sheets. It could be used for the following purposes: 1. The IRI of the Cascading RDF Sheet could serve the role of profile URI; 2. The Cascading RDF Sheet itself could serve the role of a profile document; (Finally we could get some kind of registered profile format.) 3. Just as CSS sheets today, a cRDFsheet could be used as authoring help, when authoring with a microformat. HTML editing programs could offer the elements + classes in the Cascading RDF Sheet to authors, the same way that some editors to today use the selectors in stylesheets as a vocabulary repository for the current file or project. CSS selectors is already a well known format. (One may then, of course, already use a CSS style sheet for this, kind of. But this soon becomes clumsy. Better to separate styling from semantics and structure.) In fact, I myself begun looking into creating something along these lines ... Though rather than a Cascading RDF Sheet, I looked into creating a Profile Style Sheet which could be used to define a machine readable microformat profile. My motivation for doing this was the authoring side of things, as I have been using a text editor which more or less uses CSS selectors the same way. (Instead of only offering me to pick p it also offers me to to pick p class=a etc.) Ian's proposal do not give much thought about the authoring side, I feel, except for the more casual author. For authors, it is helpful to have a recipe document and to avoid repetition and data rot, as you mentioned in another message. Ian's microdata format is easy to grasp the inner logics of - that is a good side of the proposal, this could help that it gets used. But when it comes to author's and author groups' ability to define their own, decentralised semantics etc., then a decent profile format, which could be easily and simply integrated with authoring tools, seems like a just as important issue as a super simple microdata format. The microformats.org community does not really have a machine parsable profile format. If there were such a format, I believe we would see more of more decentralized microformats. -- leif halvard silli
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Sun, 10 May 2009 12:32:34 +0200, Ian Hickson i...@hixie.ch wrote: Page 3: h2My Catsh2 dl dtSchrouml;dinger dd item=com.damowmow.cat meta property=com.damowmow.name content=Schrouml;dinger meta property=com.damowmow.age content=9 p property=com.damowmow.descOrange male. dtErwin dd item=com.damowmow.cat meta property=com.damowmow.name content=Lord Erwin meta property=com.damowmow.age content=3 p property=com.damowmow.descSiamese color-point. img property=com.damowmow.img alt= src=/images/erwin.jpeg /dl Given the microdata solution and this example, there is now a reason other than styling to introduce di, since here you duplicate the dt information in meta. dl di item=com.damowmow.cat dt property=com.damowmow.nameSchrouml;dinger dd meta property=com.damowmow.age content=9 p property=com.damowmow.descOrange male. /di ... The styling problem is discussed at http://forums.whatwg.org/viewtopic.php?t=47 -- Simon Pieters Opera Software
Re: [whatwg] Annotating structured data that HTML has no semantics for
Ian Hickson: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. (..) SCENARIOS: Between the scenarios should be considered also this case: * a user (or groups of users) wants to annotate items present on a generic web page with additional properties in a certain vocabulary. for example Joe wants to gather in a blog a series of personal annotation to movies (or other type of items) present in imdb.com. other examples of external annotation could be derived from this document [1]. this option require that @subject accept: 1) ID of an element with an item attribute, in the same Document or 2) valid URL of an element with an item attribute elsewhere in the web or 3) a valid URL (ithe item is the referred document or fragment) This raises two other questions: a) In the case of properties specified for element without ancestor with an item attribute specified the corresponding item should be the document? (element body with implicit item attribute). b) Do we need to require UA to offer a standard way to visualize (at least as an option left to the user) the structured information carried in microdata ? And copypaste? See also this email [2]. [1] http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r01 [2] http://lists.w3.org/Archives/Public/public-html/2009Jan/0082.html -- Giovanni Gentili
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Mon, May 11, 2009 at 6:15 PM, Giovanni Gentili giovanni.gent...@gmail.com wrote: * a user (or groups of users) wants to annotate items present on a generic web page with additional properties in a certain vocabulary. for example Joe wants to gather in a blog a series of personal annotation to movies (or other type of items) present in imdb.com. [...] this option require that @subject accept: 1) ID of an element with an item attribute, in the same Document or 2) valid URL of an element with an item attribute elsewhere in the web or 3) a valid URL (ithe item is the referred document or fragment) For the RDF output, you can use link property=about href=http://subject/; to create triples whose subject is a URL. (I believe in general you can also do: meta item id=n0 link subject=n0 property=about href=http://subject/; link subject=n0 property=http://predicate1/; href=http://object1/; meta subject=n0 property=http://predicate2/; content=object2 to represent arbitrary RDF triples.) I don't think it would make sense for @subject to be a URL when generating JSON output, because there wouldn't be anywhere to represent that URL in the output structure. But there could be a convention that properties called about indicate the URLs that the item applies to, and then it would work with exactly the same markup as the RDF case. -- Philip Taylor exc...@gmail.com
Re: [whatwg] Annotating structured data that HTML has no semantics for
A cursory glance on the new section 5 raises two questions on indirection: (Note the metas in the last example -- since sometimes the information isn't visible, rather than requiring that people put it in and hide it with display:none, which has a rather poor accessibility story, I figured we could just allow meta anywhere, if it has a property= attribute.) That seems to be a solution optimised for extremely invisible metadata but not for metadata which differs from the human visible data. Imagine as an example the simple act of marking up a number (and ignoring what the number denotes). For human consumption a thousands seperator is often used, the type of seperator differs by language, locale and context. Just in my little word I see on regular basis the point, the comma, the space, the thin space and sometimes the the apostrophe. Parsing different representations of numbers would be a chore. The value of textContent of the element span itemprop=com.example.price€nbsp;1thinsp;000thinsp;000,mdash;/ span is clearly unusable, demanding an additional invisible meta property=com.example.price content=100. My irritation lies in the element proliferation, requiring one element/ attribute combination for machines, one element/text content combination for humans. Of course, any sane author would arrange both elements in a close relation, as parent/child or sibling but there would be still two different elements to maintain, leading to a higher cognitive load. Not just for authors but also for programmers: a fluctating price had to be actualized on two different elements; tree walking DOM scripts had to take meta-Elements in account. Furthermore it clashes with the familiar habit of other elements in HTML. A hyperlink is one element with a machine-readable attribute and human- readable text content. A citation is one element with a machine- readable reference and human-readable text content. The same model is used in meter, progress, time, abbr ... but not in user- defined objects. I'd prefer an additional @content-like attribute which supersedes the text content and maybe even the default values of the other value-bearing elements, reducing two different elements to maintain or change to just one. Instead, let us try using the regular IDREF functionality that HTML uses in a variety of other places, like label for=. For this we'll need a new attribute, but unfortunately we can't use about= (which would be the obvious name to use), because that would conflict with RDFa, so instead we'll use subject=: I'm slighty irritated by the implied change from active, possessive formulating (“The cat has the name Hedral.”) to something more passive- y (“Hedral is a name owned by that cat.“). My mental model for property relationships orients itself more on the former wording; link relationships are similar in that regard. @about/@subject are like @rev; a @resource alias @rel would feel more natural. There are practical relation by the missing @resource, I think. Imagine a document documenting an household and a household vocabulary which allows triples of humans which are in an owner relationship to a cat. Given an household of two humans and one cat; how does one markup the assumption that the cat has two owners?
[whatwg] Annotating structured data that HTML has no semantics for
One of the more elaborate use cases I collected from the e-mails sent in over the past few months was the following: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. SCENARIOS: * A group of users want to mark up their iguana collections so that they can write a script that collates all their collections and presents them in a uniform fashion. * A scholar and teacher wants other scholars (and potentially students) to be able to easily extract information about what he teaches to add it to their custom applications. * The list of specifications produced by W3C, for example, and various lists of translations, are produced by scraping source pages and outputting the result. This is brittle. It would be easier if the data was unambiguously obtainable from the source pages. This is a custom set of properties, specific to this community. * Chaals wants to make a list of the people who have translated W3C specifications or other documents, and then use this to search for people who are familiar with a given technology at least at some level, and happen to speak one or more languages of interest. * Chaals wants to have a reputation manager that can determine which of the many emails sent to the WHATWG list might be more than usually valuable, and would like to seed this reputation manager from information gathered from the same source as the scraper that generates the W3C's TR/ page. * A user wants to write a script that finds the price of a book from an Amazon page. * Todd sells an HTML-based content management system, where all documents are processed and edited as HTML, sent from one editor to another, and eventually published and indexed. He would like to build up the editorial metadata used by the system within the HTML documents themselves, so that it is easier to manage and less likely to be lost. * Tim wants to make a knowledge base seeded from statements made in Spanish and English, e.g. from people writing down their thoughts about George W. Bush and George H.W. Bush, and has either convinced the people making the statements that they should use a common language-neutral machine-readable vocabulary to describe their thoughts, or has convinced some other people to come in after them and process the thoughts manually to get them into a computer-readable form. REQUIREMENTS: * Vocabularies can be developed in a manner that won't clash with future more widely-used vocabularies, so that those future vocabularies can later be used in a page making use of private vocabularies without making the earlier annotations ambiguous. * Using the data should not involve learning a plethora of new APIs, formats, or vocabularies (today it is possible, e.g., to get the price of an Amazon product, but it requires learning a new API; similarly it's possible to get information from sites consistently using 'class' values in a documented way, but doing so requires learning a new vocabulary). * Shouldn't require the consumer to write XSLT or server-side code to process the annotated data. * Machine-readable annotations shouldn't be on a separate page than human-readable annotations. * The information should be convertible into a dedicated form (RDF, JSON, XML) in a consistent manner, so that tools that use this information separate from the pages on which it is found have a standard way of conveying the information. * Should be possible for different parts of an item's data to be given in different parts of the page, for example two items described in the same paragraph. (The two lamps are A and B. The first is $20, the second $30. The first is 5W, the second 7W.) * It should be possible to define globally-unique names, but the syntax should be optimised for a set of predefined vocabularies. * Adding this data to a page should be easy. * The syntax for adding this data should encourage the data to remain accurate when the page is changed. * The syntax should be resilient to intentional copy-and-paste authoring: people copying data into the page from a page that already has data should not have to know about any declarations far from the data. * The syntax should be resilient to unintentional copy-and-paste authoring: people copying markup from the page who do not know about these features should not inadvertently mark up their page with inapplicable data. * Any additional markup or data used to allow the machine to understand
Re: [whatwg] Annotating structured data that HTML has no semantics for
On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote: One of the more elaborate use cases I collected from the e-mails sent in over the past few months was the following: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. [...] To address this use case and its scenarios, I've added to HTML5 a simple syntax (three new attributes) based on RDFa. There's a quickly-hacked-together demo at http://philip.html5.org/demos/microdata/demo.html (works in at least Firefox and Opera), which attempts to show you the JSON serialisation of the embedded data, which might help in examining the proposal. -- Philip Taylor exc...@gmail.com
Re: [whatwg] Annotating structured data that HTML has no semantics for
Quoting Philip Taylor excors+wha...@gmail.com: On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote: One of the more elaborate use cases I collected from the e-mails sent in over the past few months was the following: USE CASE: Annotate structured data that HTML has no semantics for, and which nobody has annotated before, and may never again, for private use or use in a small self-contained community. [...] To address this use case and its scenarios, I've added to HTML5 a simple syntax (three new attributes) based on RDFa. There's a quickly-hacked-together demo at http://philip.html5.org/demos/microdata/demo.html (works in at least Firefox and Opera), which attempts to show you the JSON serialisation of the embedded data, which might help in examining the proposal. I have a *totally unfinished* demo that does something rather similar at [1]. It is highly likely to break and/or give incorrect results**. If you use it for anything important you are insane :) My general impression from writing the tool is that this proposal is, at least, easy to write consumers for. I get the feeling that the production side will also be within the grasp of most authors, although it is hard to say for sure since I haven't really tried authoring anything. [1] http://james.html5.org/microdata/ ** Known bugs include: incorrect lowercasing of non ascii characters, lack of support for resolving uris, lack of rdf output, some others that I forget