On Nov 2, 2006, at 00:17, Anne van Kesteren wrote:

On Wed, 01 Nov 2006 20:55:58 +0100, James Graham <[EMAIL PROTECTED]> wrote:
To take a slight detour into the (hopefully not too) abstract, what do people think the fundamental point of semantics in HTML is?

I think the fundamental point is allowing programmatic processing of documents in ways that are *useful* and that semantic markup makes *practical* but that would be considerably less practical with presentation-based heuristics *and* enabling the processing without those wanting to do it having to negotiate with the author (or enabling the author to get off-the-shelf software for processing his/ her own documents).

Rendering for media different from the author's primary target is such processing done in software controlled by others.

Indexing documents and taking extracts for display in search results is such processing done in software controlled by others.

Generating a table of contents could be a case of the author wanting to get off-the-shelf software that works with his/her own documents.

So I think the merit of semantic elements in HTML should not be judged in terms of the willingness of semanticists to express stuff but instead the merit should be judged against the willingness of software developers to write software that consumes the expression for a useful purpose and the whether authors in general are incentivized to support such processing (either knowingly or as a side effect of accomplishing other goals).

Those elements should then not have any presentational aspect

Why not?

To serve media-independent presentation, having reasonable presentations for different media is more useful than having a semantic definition.

(What kinds of different media there can be is limited by how you can deliver data into a human. In the absence of direct-to-brain transfers, you are in practice limited to visual, aural and tactile media.)

We probably don't want things like:

  <sci-fi-serie-title>Stargate Atlantis</sci-fi-serie-title>


Although I suppose that at some point you do want to able to express the latter.

I think we should not care if someone wants to *express* it unless there is notable practical interest in *consuming* the expression. (Not "would be cool" interest but "would write software" interest.)

Henri has been talking about the possibility of making HTML5 more "semantically lax", and here Anne is interested in where it is not "semantically pure", presumably with a desire to fixing it.

My point is that if the semantics for a given element are not precise enough or authors aren't incentivized to use them properly so that non-presentation use of the semantics becomes impossible or prohibitively impractical, what is left is use for media-independent rendering and at that point it is enough define the element in terms of default presentation or, if the element doesn't have a distinguishing default presentation, not include the element.

Example with existing markup:
<dl> has a well-understood default presentation (at least for visual media), but on the real Web, it doesn't have precise enough semantics to allow heuristic-free reasoning such as compiling a search database of definitions for words by scraping the Web. Yet, <dl> is useful for achieving a particular kind of organization of pieces of text (list of items where the items have an inline label and a block of text) in a backwards-compatible way that works even in unstyled HTML. Therefore, it is useful to have <dl> around as a media-independent grouping device that doesn't have profound semantics.

Example against introducing new markup:
In discussions where <i> is assumed to be axiomatically evil and semantic alternatives are sought, it often comes up that in text discussing biology the taxonomical Latin names of organisms are italicized. Should HTML have an element for marking up a piece of text as a biological taxonomical name? I say no. For data mining (including search engines) it is easier to compile a list of known taxonomical names and compare strings against that list than to badger every biologist to use the semantic element. As for presentation, <i> works just fine. The effects of <i> on aural or tactile media probably won't be so bad that most authors would be willing to take special steps. For authors themselves getting off-the- shelf software that does useful things, the case is probably too specific and lacks processing use cases to create a market. However, what authors might want to do is to use the taxonomical names as terms in an index in print. However, for that use case to cover different kinds of text with index terms, you'd want something more generic than markup for biological taxonomical names. (An index is not needed for interactive screen media, because you can search for any string anyway.)

[...] I also don't know which view best fits my position because I don't really understand what people are trying to achieve with (the markup in) HTML -- I think there are things I would change in the current draft, but there seems little point talking about which markup elements should or shouldn't exist without having some overall framework against which the merit of various proposals can be measured.

+1.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/


Reply via email to