Re: [whatwg] several messages about HTML5 -- authors' tools

ddailey Thu, 22 Feb 2007 05:14:36 -0800

Interesting thread (including various sub-ravels thereof).

Suppose in a semantically charged, but markup-impoverished medium such asthe textual narrative (constituting the majority of the content of the webas we know it), we seek to build the word processor that generates not onlythe surface structure (the sentences and paragraphs) but the semanticstructure as well. How do we minimize the author's effort? Authors will notwant to write both their utterances and the translation of those utterancesinto semantic tags -- it is simply too labor intensive (unless we care moreabout form than substance and chose to purge ill-formed ideas from the humancorpus).

Rather we may seek a word-processor that deduces semantics from authors'expressions. Yeah, without the existence of full-blown AI (which has been awhile in coming, now), 40% (or so) of such deductions will be incorrect. Butsuppose following the creation of a sentence, or a paragraph, or a largerchunk of text, semantically enabled software were to pose to the author afinite (and hopefully small) number of "deductive disambiguators"?


"Dear author, did you mean to imply that AI will indeed, be arriving soon?"

"Dear author, who exactly does 'we' refer to in the above paragraph -- I'msorry, but I see no people mentioned before?"

Using a relatively simply inference engine (SFOL + set theory + predicatecalculus + arithmetic + time + causality + modal logic) coupled withthesauri and parsers (all available client-side these days), and (mostimportantly) the author's expert intervention, I rather suspect that the40% (incorrect deductions) could be brought down to 8% with an additionalcost of 20% in authorial time investment. With current software that mostfolks use, and requiring authors to generate their own semantics, I think wemight expect to achieve 5% spurious deduction with 400% additionalinvestment of authors' time. The cost-benefit ratio is just too high withcurrent desktop tools.

In semantically impoverished (not in the evocative space it engenders, butin the surface expression of its utterance) but markup-rich environmentssuch as SVG, the generation of a parallel semantic substrate is going to bea lot more difficult, but maybe that's why we have things like sXBL: toallow semantics to be imported from other disciplines.

That's one approach. Another is to build a semantic expression system forwhich we abandon our native languages and agree to write in a semanticshorthand (with lots of parentheses, by the way). For even one language, thetask of finding a minimal set of semantic primitives (from its monolingualdictionary) is NP-complete, but if we seek such a shorthand to span thespace of human semantics, it may take longer to bring into existence than AIitself . The different language families I have looked at probably share acore semantics of only about 20% of the expressive space of any one languageby itself. The nice thing about such languages is that people from differentlinguistic backgrounds can all read the same text; the hassle is that it'shard to translate ordinary expressions into such languages.


cheers,
David Dailey

----- Original Message -----From: "Elliotte Harold" <[EMAIL PROTECTED]>

To: "Ian Hickson" <[EMAIL PROTECTED]>

Cc: <whatwg@lists.whatwg.org>; "Vlad Alexander (xhtml.com)"<[EMAIL PROTECTED]>

Sent: Wednesday, February 21, 2007 4:34 PM
Subject: Re: [whatwg] several messages about HTML5

Ian Hickson wrote:
The original reason I got involved in this work is that I realised thatthe human race has written literally billions of electronic documents,but without ever actually saying how they should be processed.
That's a feature, not a bug.
If, in a thousand years, someone found a trove of HTML documents anddecided theywould right an HTML browser to view them, they couldn't do it! Even withthe existing HTML specs -- HTML4, SGML, DOM2 HTML, etc -- a perfectimplementation couldn't render the vast majority of documents as theywere originally intended.
Authorial intent is a myth. Documents don't have to be rendered like theauthor intended, nor should we expect them to be. We don't read Homerlike Homer intended, but we still read him, well more than a thousandyears later. (For one thing Homer actually intended that people listen tothe poems, not read them.)
This is not to say that I don't think it's useful to define a standardtree structure for documents. It is useful. However the benefit of thisexercise is not in maintaining authorial intent. That's tilting atwindmills, and will never succeed no matter what we do.
--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] several messages about HTML5 -- authors' tools

Reply via email to