Interesting thread (including various sub-ravels thereof).

Suppose in a semantically charged, but markup-impoverished medium such as the textual narrative (constituting the majority of the content of the web as we know it), we seek to build the word processor that generates not only the surface structure (the sentences and paragraphs) but the semantic structure as well. How do we minimize the author's effort? Authors will not want to write both their utterances and the translation of those utterances into semantic tags -- it is simply too labor intensive (unless we care more about form than substance and chose to purge ill-formed ideas from the human corpus).

Rather we may seek a word-processor that deduces semantics from authors' expressions. Yeah, without the existence of full-blown AI (which has been a while in coming, now), 40% (or so) of such deductions will be incorrect. But suppose following the creation of a sentence, or a paragraph, or a larger chunk of text, semantically enabled software were to pose to the author a finite (and hopefully small) number of "deductive disambiguators"?

"Dear author, did you mean to imply that AI will indeed, be arriving soon?"

"Dear author, who exactly does 'we' refer to in the above paragraph -- I'm sorry, but I see no people mentioned before?"

Using a relatively simply inference engine (SFOL + set theory + predicate calculus + arithmetic + time + causality + modal logic) coupled with thesauri and parsers (all available client-side these days), and (most importantly) the author's expert intervention, I rather suspect that the 40% (incorrect deductions) could be brought down to 8% with an additional cost of 20% in authorial time investment. With current software that most folks use, and requiring authors to generate their own semantics, I think we might expect to achieve 5% spurious deduction with 400% additional investment of authors' time. The cost-benefit ratio is just too high with current desktop tools.

In semantically impoverished (not in the evocative space it engenders, but in the surface expression of its utterance) but markup-rich environments such as SVG, the generation of a parallel semantic substrate is going to be a lot more difficult, but maybe that's why we have things like sXBL: to allow semantics to be imported from other disciplines.

That's one approach. Another is to build a semantic expression system for which we abandon our native languages and agree to write in a semantic shorthand (with lots of parentheses, by the way). For even one language, the task of finding a minimal set of semantic primitives (from its monolingual dictionary) is NP-complete, but if we seek such a shorthand to span the space of human semantics, it may take longer to bring into existence than AI itself . The different language families I have looked at probably share a core semantics of only about 20% of the expressive space of any one language by itself. The nice thing about such languages is that people from different linguistic backgrounds can all read the same text; the hassle is that it's hard to translate ordinary expressions into such languages.

cheers,
David Dailey

----- Original Message ----- From: "Elliotte Harold" <[EMAIL PROTECTED]>
To: "Ian Hickson" <[EMAIL PROTECTED]>
Cc: <whatwg@lists.whatwg.org>; "Vlad Alexander (xhtml.com)" <[EMAIL PROTECTED]>
Sent: Wednesday, February 21, 2007 4:34 PM
Subject: Re: [whatwg] several messages about HTML5


Ian Hickson wrote:

The original reason I got involved in this work is that I realised that the human race has written literally billions of electronic documents, but without ever actually saying how they should be processed.

That's a feature, not a bug.

If, in a thousand years, someone found a trove of HTML documents and decided they would right an HTML browser to view them, they couldn't do it! Even with the existing HTML specs -- HTML4, SGML, DOM2 HTML, etc -- a perfect implementation couldn't render the vast majority of documents as they were originally intended.


Authorial intent is a myth. Documents don't have to be rendered like the author intended, nor should we expect them to be. We don't read Homer like Homer intended, but we still read him, well more than a thousand years later. (For one thing Homer actually intended that people listen to the poems, not read them.)

This is not to say that I don't think it's useful to define a standard tree structure for documents. It is useful. However the benefit of this exercise is not in maintaining authorial intent. That's tilting at windmills, and will never succeed no matter what we do.

--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/




Reply via email to