On Dec 22, 2006, at 3:23 AM, Benjamin Hawkes-Lewis wrote:

Henri Sivonen wrote:
...
Also, it seems to me that the usefulness of non-heuristic machine consumption of semantic roles of things like dialogs, names of vessels, biological taxonomical names, quotations, etc. has been vastly exaggerated.

I'm not entirely sure what "non-heuristic machine consumption" is,

An example of non-heuristic machine consumption is where Google Glossary thinks: "In an HTML 3.2 or earlier document containing the code '<dl><dt>foo<dt> <dd>bar</dd></dl>', 'bar' is a definition of 'foo'". (It probably thinks the same about HTML 4 documents, too, which is applying a small "ignore that nonsense about dialogues" heuristic.)

An example of heuristic machine consumption is where Google Glossary thinks: "In an HTML document containing the code '<p><b>foo:</b> bar</p>', 'bar' is probably a definition of 'foo', especially if the page has several consecutive paragraphs with that structure and different bold text."

Non-heuristic machine consumption fails when semantic elements are abused, and becomes practical when elements have multiple popular meanings (examples of the latter include <dl> in HTML 4, and <p> in HTML 5). Heuristic machine consumption fails occasionally by the very nature of heuristics (examples currently include
<http://www.google.com/search?q=define:author> and
<http://www.google.com/search?q=define:editor>.)

--
Matthew Paul Thomas
http://mpt.net.nz/

Reply via email to