Re: [whatwg] Semantic styling languages in the guise of HTMLattributes.

James Graham Wed, 27 Dec 2006 16:56:50 -0800

Mike Schinkel wrote:

Matthew Paul Thomas wrote:
On Dec 22, 2006, at 3:23 AM, Benjamin Hawkes-Lewis wrote:
Henri Sivonen wrote:
...
Also, it seems to me that the usefulness of non-heuristic machineconsumption of semantic roles of things like dialogs, names ofvessels, biological taxonomical names, quotations, etc. has beenvastly exaggerated.
I'm not entirely sure what "non-heuristic machine consumption" is,
An example of non-heuristic machine consumption is whereGoogle Glossary thinks: "In an HTML 3.2 or earlier documentcontaining the code '<dl><dt>foo<dt> <dd>bar</dd></dl>','bar' is a definition of 'foo'". (It probably thinks the sameabout HTML 4 documents, too, which is applying a small"ignore that nonsense about dialogues" heuristic.)
An example of heuristic machine consumption is where Google Glossary
thinks: "In an HTML document containing the code'foo: bar', 'bar' is probably a definition of'foo', especially if the page has several consecutiveparagraphs with that structure and different bold text."
Non-heuristic machine consumption fails when semanticelements are abused, and becomes practical when elements havemultiple popular meanings (examples of the latter include<dl> in HTML 4, and in HTML 5). Heuristic machineconsumption fails occasionally by the very nature ofheuristics (examples currently include<http://www.google.com/search?q=define:author> and
<http://www.google.com/search?q=define:editor>.)
The origin of this thread was my request for adding attributes to all
elements to support microformat-like semantic markup. Based on the context
of your reply, it seems you are agreeing with Matthew Raymond in his
assertion that using microformat-like semantic markup is A Bad Thing(tm). Am
I understanding your position correctly? (If I'm not, please forgive me.)

Actually, IMHO mpt's point is far broader and consequentially moreimportant than the confines of the original thread. The point, as Iunderstand it, is that machine analysis of "semantic" markup fails ifthe markup construct is (ab)used in so many different ways that theinterpretation of any particular fragment is no longer unambiguous. Thisis a sort of "heat[1] death" of the original semantics; as the use of anelement becomes increasingly disordered (i.e. higher entropy), itbecomes impossible to extract any useful information from the use ofthat element. This is critical in the proper design of semantic markuplanguages because one wishes to stave off the heat death as long aspossible so that, as far as possible, UAs can perform useful functionsbased on the information in the markup (e.g. render it to a media forwhich the content was not explicitly designed). Obviously I don't knowhow to achieve this but there are a few things to consider:

* Have enough elements. If there are obvious holes that people can'tfill with existing elements used properly, they will reuse existingelements in new ways so increasing their entropy.

* Don't have too many elements: If there are too many elements peoplewon't understand them all and will reuse existing elements in the"wrong" way, so increasing their entropy.

* Make the semantics of elements well defined: Start the elements in a"low entropy" i.e. highly ordered state. Make it obvious how the elementis intended to be used (and restrict the valid uses to ones that can bediscriminated by machine) so that fewer people accidentally abuse it.

* Have some "high entropy" elements. This is the counterintuitive one.The goal, remember, is to extract as much information as possible fromthe semantically well-defined elements. However, in many situationsthere will not be a relevant element to use, the publishing setup willnot be optimized for selecting the correct semantic element (thinkWYSIWYG editors), or the author will not be sufficiently familiar withthe language semantics to make a well-informed choice about the rightelement to use. In this case providing (and encouraging the use of!) aset of high entropy "bit-bucket" elements that are semanticallymeaningless is very beneficial because they prevent the entropyincrease associated with the abuse of the semantic elements. Theincreasing misuse of as a "more semantic" is an example of whathappens when this policy is not followed.

* Allow easy extensions. Having an extension mechanism for those whoneed more functionality is one way to stop the abuse of existingelements. This has to be sufficiently easy to use that the it can bewidely adopted but powerful enough that it can replicate all thesemantic features of the host language.

This post was brought to you by the society for dodgy physical analogiesconcocted in the middle of the night.

[1] Or, if you like, "Entropy death". Of course, this has nothing to dowith real physical entropy but a lot to do with the common associationbetween the second law of thermodynamics and the concept of disorder.

Re: [whatwg] Semantic styling languages in the guise of HTMLattributes.

Reply via email to