Re: [whatwg] Semantics in HTML

2006-11-04 Thread Henri Sivonen

On Nov 2, 2006, at 00:17, Anne van Kesteren wrote:

On Wed, 01 Nov 2006 20:55:58 +0100, James Graham [EMAIL PROTECTED]  
wrote:
To take a slight detour into the (hopefully not too) abstract,  
what do people think the fundamental point of semantics in HTML is?


I think the fundamental point is allowing programmatic processing of  
documents in ways that are *useful* and that semantic markup makes  
*practical* but that would be considerably less practical with  
presentation-based heuristics *and* enabling the processing without  
those wanting to do it having to negotiate with the author (or  
enabling the author to get off-the-shelf software for processing his/ 
her own documents).


Rendering for media different from the author's primary target is  
such processing done in software controlled by others.


Indexing documents and taking extracts for display in search results  
is such processing done in software controlled by others.


Generating a table of contents could be a case of the author wanting  
to get off-the-shelf software that works with his/her own documents.


So I think the merit of semantic elements in HTML should not be  
judged in terms of the willingness of semanticists to express stuff  
but instead the merit should be judged against the willingness of  
software developers to write software that consumes the expression  
for a useful purpose and the whether authors in general are  
incentivized to support such processing (either knowingly or as a  
side effect of accomplishing other goals).



Those elements should then not have any presentational aspect


Why not?

To serve media-independent presentation, having reasonable  
presentations for different media is more useful than having a  
semantic definition.


(What kinds of different media there can be is limited by how you can  
deliver data into a human. In the absence of direct-to-brain  
transfers, you are in practice limited to visual, aural and tactile  
media.)



We probably don't want things like:

  sci-fi-serie-titleStargate Atlantis/sci-fi-serie-title


Although I suppose that at some point you do want to able to  
express the latter.


I think we should not care if someone wants to *express* it unless  
there is notable practical interest in *consuming* the expression.  
(Not would be cool interest but would write software interest.)


Henri has been talking about the possibility of making HTML5 more  
semantically lax, and here Anne is interested in where it is not  
semantically pure, presumably with a desire to fixing it.


My point is that if the semantics for a given element are not precise  
enough or authors aren't incentivized to use them properly so that  
non-presentation use of the semantics becomes impossible or  
prohibitively impractical, what is left is use for media-independent  
rendering and at that point it is enough define the element in terms  
of default presentation or, if the element doesn't have a  
distinguishing default presentation, not include the element.


Example with existing markup:
dl has a well-understood default presentation (at least for visual  
media), but on the real Web, it doesn't have precise enough semantics  
to allow heuristic-free reasoning such as compiling a search database  
of definitions for words by scraping the Web. Yet, dl is useful for  
achieving a particular kind of organization of pieces of text (list  
of items where the items have an inline label and a block of text) in  
a backwards-compatible way that works even in unstyled HTML.   
Therefore, it is useful to have dl around as a media-independent  
grouping device that doesn't have profound semantics.


Example against introducing new markup:
In discussions where i is assumed to be axiomatically evil and  
semantic alternatives are sought, it often comes up that in text  
discussing biology the taxonomical Latin names of organisms are  
italicized. Should HTML have an element for marking up a piece of  
text as a biological taxonomical name? I say no. For data mining  
(including search engines) it is easier to compile a list of known  
taxonomical names and compare strings against that list than to  
badger every biologist to use the semantic element. As for  
presentation, i works just fine. The effects of i on aural or  
tactile media probably won't be so bad that most authors would be  
willing to take special steps. For authors themselves getting off-the- 
shelf software that does useful things, the case is probably too  
specific and lacks processing use cases to create a market. However,  
what authors might want to do is to use the taxonomical names as  
terms in an index in print. However, for that use case to cover  
different kinds of text with index terms, you'd want something more  
generic than markup for biological taxonomical names. (An index is  
not needed for interactive screen media, because you can search for  
any string anyway.)


[...] I also don't know which view best fits my 

Re: [whatwg] Semantics in HTML

2006-11-01 Thread James Graham

Anne van Kesteren wrote:
[...] I also don't know which view best fits my position because I 
don't really understand what people are trying to achieve with (the 
markup in) HTML -- I think there are things I would change in the 
current draft, but there seems little point talking about which markup 
elements should or shouldn't exist without having some overall 
framework against which the merit of various proposals can be measured.


How would such a framework be defined?


I think measured might have been a bit of a strong word, implying some 
sort of quantitative assessment. What I really mean is just qualitative 
- an idea of the class of use cases that an element or attribute should 
satisfy to be included in the spec, and a idea of how tightly 
constrained the use of various elements should be. So, for example, is 
the point of the em element to allow authors to mark text as 
emphasized, with no regard for whether UAs can plausibly use this 
information, or is it because emphasized text benefits from a default 
presentation different to standard paragraph text in a variety of media? 
  Obviously there are views between these extremes of idealism and 
pragmatism, as well as other issues like UAs which aggregate information 
e.g. search bots which complicate the situation, so I am interested to 
see what, exactly, people perceive as the problem that semantic markup 
solves [1].


I think I tend toward the pragmatic end i.e. we should be looking to do 
things that solve definite problems for users of general purpose UAs and 
not worry so much about things that provide metadata for the sake of 
metadata, but I'm prepared to be convinced otherwise about this.


[1] This is distinct from the problem solved by not using presentational 
markup; one could use div and span, a bunch of classnames and CSS to 
get a document that renders nicely in multiple media, has complete 
separation of structure and content and is entirely free of semantic markup.


--
The universe doesn't care what you believe. The wonderful thing about 
science is that it doesn't ask for your faith, it just asks for your 
eyes --- http://xkcd.com/c154.html