Re: [whatwg] Semantic styling languages in the guise of HTMLattributes.

2006-12-27 Thread Mike Schinkel
James Graham wrote:
   
> Actually, IMHO mpt's point is far broader and consequentially 
> more important than the confines of the original thread. The 
> point, as I understand it, is that machine analysis of 
> "semantic" markup fails if the markup construct is (ab)used 
> in so many different ways that the interpretation of any 
> particular fragment is no longer unambiguous. This is a sort 
> of "heat[1] death" of the original semantics...

It's ironic that you use the term "entropy" here.[1]  Anyway, although in
general I agree with you, you speak in generalities so it is hard to either
concur or disprove your assertions.  

> as the use of 
> an element becomes increasingly disordered (i.e. higher 
> entropy), it becomes impossible to extract any useful 
> information from the use of that element. 

So I'd like to see some specific examples of who you would see things evolve
to the "inevitable" impossibilty?

That said, one of my biggest qualms about "microformats" per se is how they
have defined their community process. I believe their process is likely to
generate more "entropy death" than less.  I proposed alternatives, but they
claimed those alternatives were counter to their vision.  Thus I plan to use
"microformat-like semantic markup" even though it wouldn't be microformats
proper. But that's an entirely different discussion that I'm almost but not
quite prepared to discuss.

So I think the real question is this: is it possible or impossible to define
a process for "microformat-like semantic markup" that can minimize the
chance of "entropy death?" To answer the question one should understand that
a.) even prior to the emergence of "microformat-like semantic markup" we've
had lots and lots of disorder anyway, and 2.) seeing the train speeding to
the end of it's tracks doesn't mean we can stop the train if we want to. On
point #2, I still assert it's more pragmatic and hence better to work to
minimize the damage than to scold the train for "stupidly" speeding up when
approaching the end of it's tracks.

> * Have enough elements. If there are obvious holes that 
> people can't fill with existing elements used properly, they 
> will reuse existing elements in new ways so increasing their entropy.

Agreed.  That's what we get for pursuing pie-in-the-sky semantic web
exclusively while ignoring the evolution of HTML, for how long?  Also it's
what we get now for trying to put everything into HTML5 instead of planning
to rapidly release 5, 6, 7, etc.

> * Don't have too many elements: If there are too many 
> elements people won't understand them all and will reuse 
> existing elements in the "wrong" way, so increasing their entropy.

 or @attributes?  Anyway, I doubt there will be misuse if the
/@attributes have clear semantics other than possibly people not
using them when they could have. Of course elements with names like 
and  (what were they thinking when they named those?!?) are the type I
believe you are referring to.

> * Make the semantics of elements well defined: Start the 
> elements in a "low entropy" i.e. highly ordered state. Make 
> it obvious how the element is intended to be used (and 
> restrict the valid uses to ones that can be discriminated by 
> machine) so that fewer people accidentally abuse it.

Interestingly, Dion Hichcliffe had a great article[2] that argued the best
way to get a good outcome is to minimize structure at the beginning until
the patterns emerge, then layer structure on top of those patterns.  Think
of the wiki. At the beginning, it was "the simpliest thing that would work."
Had someone architected it in advance of use, they would have ended up with
Lotus Notes!  :-) And although Notes was sold to lots of corporations,
Mediawiki is far more usuable for average people than Notes; the latter
takes a salesmen to convince IT and then an IT staff to deliver edicts that
"thou shalt use."  

While his article focused on entreprise intranets, one could argue that
microformats simply might be the way of letting the world to the design for
the needs of future HTML, assuming the next version of HTML empowers people
enough to do so, and that we don't have a wait another decade before HTML6.

> * Have some "high entropy" elements. This is the 
> counterintuitive one. 
> The goal, remember, is to extract as much information as 
> possible from the semantically well-defined elements. 
> However, in many situations there will not be a relevant 
> element to use, the publishing setup will not be optimized 
> for selecting the correct semantic element (think WYSIWYG 
> editors), or the author will not be sufficiently familiar 
> with the language semantics to make a well-informed choice 
> about the right element to use. In this case providing (and 
> encouraging the use of!) a set of high entropy "bit-bucket" 
> elements that are semantically meaningless is  very 
> beneficial because they prevent the entropy increase 
> associated with the abuse of the semantic elements. The 

[whatwg] [WebApps] canvas transform()/setTransform()

2006-12-27 Thread 黒澤剛志(KUROSAWA, Takeshi)

Dear WHATWG,

Web Application 1.0 adds the transform() and the setTransform() to the
canvas 2d context.
The conversion of the arguments of these methods to the matrices is
described in the section 3.14.6.1.2.


The transform(m11, m12, m21, m22, dx, dy) method must multiply the current 
transformation matrix with the matrix described by:
m11  m12  dx
m21  m22  dy
0 0 1

The setTransform(m11, m12, m21, m22, dx, dy) method...

- http://www.whatwg.org/specs/web-apps/current-work/#transform

However, this is repugnant. And it isn't compatible with many graphics
systems. So the matrix should be

m11  m21  dx
m12  m22  dy ... (b)
0 0 1

In addtion, the rhino-canvas implements both methods. It uses the matrix (b).
http://rhino-canvas.sourceforge.net/

regards.
--
KUROSAWA, Takeshi - http://taken.s101.xrea.com/


Re: [whatwg] Semantic styling languages in the guise of HTMLattributes.

2006-12-27 Thread James Graham

Mike Schinkel wrote:

Matthew Paul Thomas wrote:
  

On Dec 22, 2006, at 3:23 AM, Benjamin Hawkes-Lewis wrote:


Henri Sivonen wrote:
...
  
Also, it seems to me that the usefulness of non-heuristic machine 
consumption of semantic roles of things like dialogs, names of 
vessels, biological taxonomical names, quotations, etc. has been 
vastly exaggerated.


I'm not entirely sure what "non-heuristic machine consumption" is,
  
An example of non-heuristic machine consumption is where 
Google Glossary thinks: "In an HTML 3.2 or earlier document 
containing the code 'foo bar', 
'bar' is a definition of 'foo'". (It probably thinks the same 
about HTML 4 documents, too, which is applying a small 
"ignore that nonsense about dialogues" heuristic.)


An example of heuristic machine consumption is where Google Glossary
thinks: "In an HTML document containing the code 
'foo: bar', 'bar' is probably a definition of 
'foo', especially if the page has several consecutive 
paragraphs with that structure and different bold text."


Non-heuristic machine consumption fails when semantic 
elements are abused, and becomes practical when elements have 
multiple popular meanings (examples of the latter include 
 in HTML 4, and  in HTML 5). Heuristic machine 
consumption fails occasionally by the very nature of 
heuristics (examples currently include 
 and

.)



The origin of this thread was my request for adding attributes to all
elements to support microformat-like semantic markup. Based on the context
of your reply, it seems you are agreeing with Matthew Raymond in his
assertion that using microformat-like semantic markup is A Bad Thing(tm). Am
I understanding your position correctly? (If I'm not, please forgive me.)
  
Actually, IMHO mpt's point is far broader and consequentially more 
important than the confines of the original thread. The point, as I 
understand it, is that machine analysis of "semantic" markup fails if 
the markup construct is (ab)used in so many different ways that the 
interpretation of any particular fragment is no longer unambiguous. This 
is a sort of "heat[1] death" of the original semantics; as the use of an 
element becomes increasingly disordered (i.e. higher entropy), it 
becomes impossible to extract any useful information from the use of 
that element. This is critical in the proper design of semantic markup 
languages because one wishes to stave off the heat death as long as 
possible so that, as far as possible, UAs can perform useful functions 
based on the information in the markup (e.g. render it to a media for 
which the content was not explicitly designed). Obviously I don't know 
how to achieve this but there are a few things to consider:


* Have enough elements. If there are obvious holes that people can't 
fill with existing elements used properly, they will reuse existing 
elements in new ways so increasing their entropy.


* Don't have too many elements: If there are too many elements people 
won't understand them all and will reuse existing elements in the 
"wrong" way, so increasing their entropy.


* Make the semantics of elements well defined: Start the elements in a 
"low entropy" i.e. highly ordered state. Make it obvious how the element 
is intended to be used (and restrict the valid uses to ones that can be 
discriminated by machine) so that fewer people accidentally abuse it.


* Have some "high entropy" elements. This is the counterintuitive one. 
The goal, remember, is to extract as much information as possible from 
the semantically well-defined elements. However, in many situations 
there will not be a relevant element to use, the publishing setup will 
not be optimized for selecting the correct semantic element (think 
WYSIWYG editors), or the author will not be sufficiently familiar with 
the language semantics to make a well-informed choice about the right 
element to use. In this case providing (and encouraging the use of!) a 
set of high entropy "bit-bucket" elements that are semantically 
meaningless is  very beneficial because they prevent the entropy 
increase associated with the abuse of the semantic elements. The 
increasing misuse of  as a "more semantic"  is an example of what 
happens when this policy is not followed.


* Allow easy extensions. Having an extension mechanism for those who 
need more functionality is one way to stop the abuse of existing 
elements. This has to be sufficiently easy to use that the it can be 
widely adopted but powerful enough that it can replicate all the 
semantic features of the host language.


This post was brought to you by the society for dodgy physical analogies 
concocted in the middle of the night.


[1] Or, if you like, "Entropy death". Of course, this has nothing to do 
with real physical entropy but a lot to do with the common association 
between the second law of thermodynamics and the concept 

Re: [whatwg] Semantic styling languages in the guise of HTMLattributes.

2006-12-27 Thread Mike Schinkel
Matthew Paul Thomas wrote:
> On Dec 22, 2006, at 3:23 AM, Benjamin Hawkes-Lewis wrote:
> >
> > Henri Sivonen wrote:
> > ...
> >> Also, it seems to me that the usefulness of non-heuristic machine 
> >> consumption of semantic roles of things like dialogs, names of 
> >> vessels, biological taxonomical names, quotations, etc. has been 
> >> vastly exaggerated.
> >
> > I'm not entirely sure what "non-heuristic machine consumption" is,
> 
> An example of non-heuristic machine consumption is where 
> Google Glossary thinks: "In an HTML 3.2 or earlier document 
> containing the code 'foo bar', 
> 'bar' is a definition of 'foo'". (It probably thinks the same 
> about HTML 4 documents, too, which is applying a small 
> "ignore that nonsense about dialogues" heuristic.)
> 
> An example of heuristic machine consumption is where Google Glossary
> thinks: "In an HTML document containing the code 
> 'foo: bar', 'bar' is probably a definition of 
> 'foo', especially if the page has several consecutive 
> paragraphs with that structure and different bold text."
> 
> Non-heuristic machine consumption fails when semantic 
> elements are abused, and becomes practical when elements have 
> multiple popular meanings (examples of the latter include 
>  in HTML 4, and  in HTML 5). Heuristic machine 
> consumption fails occasionally by the very nature of 
> heuristics (examples currently include 
>  and
> .)

The origin of this thread was my request for adding attributes to all
elements to support microformat-like semantic markup. Based on the context
of your reply, it seems you are agreeing with Matthew Raymond in his
assertion that using microformat-like semantic markup is A Bad Thing(tm). Am
I understanding your position correctly? (If I'm not, please forgive me.)

Let me ask this: Why it is not preferrable to use the non-heuristic machine
consumption that microformat-like semantic markup would allow as opposed to
limiting the web to heuristic machine consumption in so many contexts where
there are not enough semantics to know for sure? It seems to me
microformat-like semantic markup can improve on the situation as opposed to
leaving things at status quo.

And as for the status quo, the "microformat movement" is gaining momentum
because it makes possible things people need but didn't previously realize
was possible. And unlike the professionals that read and attempt to follow
standards (i.e. some browser vendors, etc.) most web content authors don't
read standards and many are not even interested in following standards if
they get in the way of accomplishing their client's goals. So the likelihood
of stopping microformat-like semantic markup is slim to NIL; why not embrace
it and provide support to help those web authors achieve their goals in as
interoperable manner as possible?

-- 
-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org/




[whatwg] Ampersands not followed by ASCII letters or #

2006-12-27 Thread Henri Sivonen
I noticed that the Web Apps spec itself contains script samples with  
unescaped JavaScript && operators in  blocks.


Considering that this is not an error in HTML 4.01 as SGML and  
considering that it is harmless in browsers, I think the top-level  
"Anything else" case under "8.2.3.1. Tokenising entities" should be  
split in two so that there is also an error-free case for the ASCII  
characters that aren't '#', aren't ASCII letters and that weren't in  
error in SGML-based HTML. I don't have The Handbook at my disposal  
right now, but the error-free case should cover at least '&', '<' and  
space characters.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




[whatwg] Dual mode for client?

2006-12-27 Thread Mike Schinkel

I'm wondering if you collectively would consider adding the following to the
spec; a recommendation that clients offer two "modes"; one mode being for
users where the spec works as currently envisioned. The second mode would be
for web developers and would generate errors for invalid markup as opposed
to generating no errors (Ian had said it was preferred to not generate
errors for invalid markup to ensure users were willing to use browsers based
on the newer spec; a dual mode would give the best of both worlds.)

If all clients had such a dual mode, I think it would be much more likely
that web developers would create valid markup.  I just hope you guys can
envision that being something mentioned and marked as "SHOULD" in the spec.

-- 
-Mike Schinkel
http://www.mikeschinkel.com/blogs/
http://www.welldesignedurls.org/




Re: [whatwg] Semantic styling languages in the guise of HTML attributes.

2006-12-27 Thread Matthew Paul Thomas

On Dec 26, 2006, at 1:50 AM, Matthew Paul Thomas wrote:

...
Non-heuristic machine consumption fails when semantic elements are 
abused, and becomes practical when elements have multiple popular 
meanings (examples of the latter include  in HTML 4, and  in 
HTML 5).


That should have been "becomes IMpractical when elements have multiple 
popular meanings". Sorry for any confusion.


--
Matthew Paul Thomas
http://mpt.net.nz/



Re: [whatwg] Semantic styling languages in the guise of HTML attributes.

2006-12-27 Thread Matthew Paul Thomas

On Dec 22, 2006, at 3:23 AM, Benjamin Hawkes-Lewis wrote:


Henri Sivonen wrote:
...
Also, it seems to me that the usefulness of non-heuristic machine 
consumption of semantic roles of things like dialogs, names of 
vessels, biological taxonomical names, quotations, etc. has been 
vastly exaggerated.


I'm not entirely sure what "non-heuristic machine consumption" is,


An example of non-heuristic machine consumption is where Google 
Glossary thinks: "In an HTML 3.2 or earlier document containing the 
code 'foo bar', 'bar' is a definition of 
'foo'". (It probably thinks the same about HTML 4 documents, too, which 
is applying a small "ignore that nonsense about dialogues" heuristic.)


An example of heuristic machine consumption is where Google Glossary 
thinks: "In an HTML document containing the code 'foo: 
bar', 'bar' is probably a definition of 'foo', especially if the 
page has several consecutive paragraphs with that structure and 
different bold text."


Non-heuristic machine consumption fails when semantic elements are 
abused, and becomes practical when elements have multiple popular 
meanings (examples of the latter include  in HTML 4, and  in 
HTML 5). Heuristic machine consumption fails occasionally by the very 
nature of heuristics (examples currently include

 and
.)

--
Matthew Paul Thomas
http://mpt.net.nz/



Re: [whatwg] [Imps] reconstruct the active formatting elements

2006-12-27 Thread Anne van Kesteren

E-mailing this to the WHATWG list instead as Implementors keeps bouncing.


On Sat, 23 Dec 2006 11:06:33 +0100, Anne van Kesteren <[EMAIL PROTECTED]>
wrote:
So we pass "X" but not  X" it  
seems.


Found the problem. The "in body" insertion mode says to "reconstruct the
active formatting elements" for any character token. That should probably
only happen for non space characters. When I made that change I passed the
testcase given and didn't regress any other testcase.

I should note though that Firefox handles a newline differently from a
space. See:

http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0A%3Cp%3E%3Cu%3E%3C/p%3E%0A%3Cp%3EX

http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0A%3Cp%3E%3Cu%3E%3C/p%3E%20%3Cp%3EX

In case of a space it seems the active formatting elements are
reconstructed.


--
Anne van Kesteren




Re: [whatwg] Semantic styling languages in the guiseof HTML attributes.

2006-12-27 Thread Mike Schinkel
Matthew Raymond wrote:
> Mike Schinkel wrote:
>> Why should attributes (only?) specify the details of semantics that
>> elements already possess?
> 
>Global attributes aren't necessarily wrong if their

By "global" do you simply mean attributes for HTML elements, i.e. a "type"
attribute for a  element, for example?
 
> purpose is orthogonal to the purpose of the elements they're
> being added to. That's why |id| and |class| are so useful.
> They don't alter the semantics of the element. Rather, they
> act as targets for styling and scripting.
>
>However, global attributes like |role|, |src| and |href|
> directly compete with the semantics of HTML elements in many
> ways. We already see this with |role| versus "HTML5". Many
> roles have semantics that overlap with elements like 
> (navigation),  (secondary), 
> (note) and  (contentinfo).

You reference altering the semantics as if that was A Bad Thing. I believe I
am to understand that you believe it is A Bad Thing, but my current view is
that it is not a bad thing and AFAICT you've not given any evidence that it
is A Bad Thing.  Now I'm not saying that I won't ultimately realize that it
is A Bad Thing, but right now I just don't see it.

>> Is there an axiom or W3C finding that we can reference for this?
> 
>Of course not. That's the problem. You see the power of
> markup being shifted from elements to attributes to attribute
> values. 

I'm having to read between the lines here in order to understand your point.
Are you saying that you see it as a big problem, but nobody else has seen it
as a big problem, or at least not enough people to author an guidance
against doing so?

> The |role| attribute itself is equivalent to having
> an infinite number of boolean attributes.

I still need to see why this is bad.

>>> Generally, though, this is just math. For every attribute or role
>>> you have that can apply to ALL elements, you have the semantics of
>>> all those  elements to interact with, plus you have interactions
>>> between an indefinite number of global attributes that may be
>>> defined on that element.
>> 
>> Can you provide some concrete examples where that might cause a
>> problem? 
> 
> 
> 
> 
> 
> 
> 
> 
>   type="file" role="wairole:checkboxtristate">  type="hidden" href="http://whatwg.org";>  src="http://whatwg.org/images/logo";>
>