date:20090518


Hello,

I don't want to go too far off topic here, but I'll respond to the 
points as I do think it illustrates one of the uses of entities 
(localization)--which would apply to some degree in XHTML (at least for 
entities) as well as in XML.


Kristof Zelechovski wrote:


Using entities in XSL to share code was my mistake once too; it is 
similar to using data members not wrapped in properties in data types. 
 XSL itself provides a better structured approach for code reuse.


Unless you're talking about variables, I guess I'd need elaboration, but 
I don't want to go too far off track on list here...


Being able to use localized programming language constructs is at the 
same time trivial (replace this with that),


I think that depends on how familiar the script and language is to you 
(cognates help many non-English Europeans, whereas the same does not 
apply elsewhere). To take some of my wife's family younger cousins, for 
example, who are not particularly educated yet who use computers as many 
Chinese do, they found it much easier to get a grasp of this "Chinese 
XHTML" than the English one, even though they had had some previous 
English instruction. I think actual research would need to be done on 
this, since it is well possible that only programmer types make it past 
the barrier to entry, and then, they may be even more inclined to 
dismiss the benefits for others less skilled; i.e., "I did it, so others 
should", or they want to get away from their linguistic background 
distinctiveness, or have perhaps irrational fears that this would lead 
to their people being satisfied with lower standards, etc. (just as many 
oppose bilingual education even while it may even help transition 
students to the mainstream language).


expensive (you have to translate the documentation)

Not sure what you mean by cost of translating the documentation. Cost 
for whom? If your audience is intended for that audience--e.g., Chinese 
code at a Chinese website--who needs to translate anything? On the 
contrary, they avoid the need to translate...


and not that useful (you freeze the language and cut the programmers 
off from the recent developments in the language).


I don't think it would be that hard to update the translating 
template--it's not that difficult. But I'm definitely not talking about 
relying on this anyways. There are big advantages to having a common 
language as far as the ability to learn from others' code from people 
around the world, etc. But just as I replied to someone on another list 
who said this was not "semantic", this is very much semantic to those 
for whom it is their native language--perhaps even more in the spirit of 
pure XML (though Babelizing semantics even further, no doubt, if people 
actually starting using this on a large scale, as search engines would 
have to be aware of either the post-transformation result or the 
localized XML, etc.).


Languages tend to use English keywords regardless of the culture of 
their designer because:


1. no matter how deep you go, there is always a place where you have 
to switch to English in order to refer to some precedent technology,


Yes, like in my use of  (though no doubt browsers 
could be fairly trivially programmed to recognize localized processing 
instructions, as well). Anyways, again, I'm in favor of a common 
language, and would even hope very much that countries around the world 
could democratically agree on an official standard (including possibly 
English, which if its use is as widespread and popular as its proponents 
believe, should have little problem obtaining a democratic majority) so 
that children will everywhere begin earlier to have access to such a 
common language. Nevertheless, if you're a beginner, having to deal with 
one line of English is a lot easier than having to deal with a whole 
syntax in English, if that's not your native language. I think the fact 
that a number of open source projects I've encountered still have not 
only comments but also even variables in the programmer's original 
language is evidence that there is some desire for convenient 
localization. If you have tools that translate it before serving the 
code, it is still available anyways.


2. the English words/roots used in the language design often have a 
slightly different meaning from the English source,


Maybe, but it is much easier to learn a few exceptions which are 
probably at least related in meaning, than to have to learn something 
completely foreign. Would you like to learn an Arabic-script XHTML, even 
if there was a one-to-one mapping from your keyboard already? Of course 
you could, but you have to admit it would take a little time out for 
you, especially if you were not already inclined to do coding/markup. 
It's not only a vocabulary issue here, but a script issue too--moreover, 
using that script may force you to switch between your keyboard layouts 
each time you want to make a document.


3. they are suffic

[whatwg] Reserving "id" attribute values?

In order to comply with XML ID requirements in XML, and facilitate 
future transitions to XML, can HTML 5 explicitly encourage id attribute 
values to follow this pattern (e.g., disallowing numbers for the 
starting character)?


Also, there is this minor errata: 
http://www.whatwg.org/specs/web-apps/current-work/#refsCSS21 is broken 
(in section 3.2)


Brett

Re: [whatwg] longdesc [was: A new attribute for and low-power devices]

2009-05-18 Thread Aryeh Gregor

On Mon, May 18, 2009 at 8:08 PM, Jim Jewett  wrote:
> The 99% misused is at best debatable.  I'm pretty sure that using a
> longer human-readable description instead of an URL was once
> (admittedly long ago) recommended.

In HTML 3.2, longdesc didn't exist:

http://www.w3.org/TR/REC-html32#img

In HTML 4, it's required to be a URI:

http://www.w3.org/TR/html4/struct/objects.html#adef-longdesc-IMG

Your other points seem reasonable, though.  I'll also note, for the
record, that since r25335 in August 2007, MediaWiki doesn't generate
bogus longdescs all over the place, so that might improve the
statistics if they were run again today.

[whatwg] longdesc [was: A new attribute for and low-power devices]

2009-05-18 Thread Jim Jewett

> In the ~0.1% of images where
> longdesc= is used, it's misused literally over 99% of the time:

> http://blog.whatwg.org/the-longdesc-lottery

Responding for the archive; that blog bost keeps getting cited, but it
isn't up to Mark's usual standards.  longdesc is not a success story,
but neither is it the miserable failure suggested by those numbers.

The 99.9% unused is (or at least was) probably close to correct, and
is a good thing.  I just checked the front page of CNN, where there
are 137 images, of which at most one would benefit from a longdesc --
and even that one is pretty questionable.

The 99% misused is at best debatable.  I'm pretty sure that using a
longer human-readable description instead of an URL was once
(admittedly long ago) recommended.  It worked at least as well with
the browsers I tested with at the time.  Blanks should be treated the
same way as blank alts -- an explicit statement that this image does
not need a long description.  URLs which are redundant to something
else in the area are actually a good thing, since that "something"
isn't standardized. (aria-described-by should offer a better solution
going forward.)

http://wiki.whatwg.org/wiki/Longdesc_usage makes it clear that useful
(if not pedantically correct) usage is much greater than 1% of the
actual usage.  Not as high as it should be, certainly, but still
better than, say, the percentage of tables which represent data rather
than layout.

-jJ

Re: [whatwg] A new attribute for and low-power devices

2009-05-18 Thread Aryeh Gregor

On Mon, May 18, 2009 at 5:28 PM, Benjamin M. Schwartz
 wrote:
> Authors who are only testing on modern desktops will, as you say, likely
> ignore this issue.  I therefore fully expect that they will never set this
> attribute.

Isn't that like saying that authors who are only testing on normal
browsers will likely ignore the longdesc= attribute?  It seems like
most authors do just ignore it, but the ones who don't get it wrong
far more often than they get it right.  In the ~0.1% of images where
longdesc= is used, it's misused literally over 99% of the time:

http://blog.whatwg.org/the-longdesc-lottery

It thus ends up being so useless for users that even if you do provide
a good longdesc, no one will actually use it.  There's so little
signal and so much noise that screenreader users just don't bother
checking it, if they even know that it exists.

It thus seems like it would be prudent to wait on implementation
experience to see if a new attribute is actually needed here.  Adding
attributes that don't affect most users is a recipe for widespread
misuse.  In the worst case, browsers might very well refuse to support
the attribute because it's come into wide misuse before any browser
actually supports it, so supporting it breaks sites.  (I'm pretty sure
there are examples of this happening, although I can't think of any
offhand.)

Re: [whatwg] A new attribute for and low-power devices

Simon Pieters wrote:
> On Mon, 18 May 2009 18:59:01 +0200, Benjamin M. Schwartz
>  wrote:
> 
>> Simon Pieters wrote:
>>> If there is a controls attribute or if scripting is disabled, show
>>> controls, else use author-provided scripted button (if any) to play the
>>> video.
>>
>> Consider a webpage in which a side-effect of clicking on some scripted
>> button is to trigger a small animation (using ) elsewhere on the
>> page.  If your browser is configured to show  full-screen, this
>> webpage will become nearly unusable, because the small animation will
>> take
>> over the screen every time you click on a button.
> 
> I'm not convinced that this will be a problem in practice.
> 
> 
>> I am proposing an additional attribute for  so that the browser
>> will know not to do that.
> 
> I'm not convinced that an additional attribute would solve the problem:
> it is likely that some authors would use the attribute incorrectly,
> because it doesn't have any effect in their primary testing environment.
> If an author sets the attribute where it shouldn't be set, it
> effectively makes the video unavailable to users whose UA acts upon the
> attribute, which seems bad.

Then I will attempt to convince you.  Suppose the additional attribute is
a boolean called "decorative", defaulting to "false" if not present.
Authors who are only testing on modern desktops will, as you say, likely
ignore this issue.  I therefore fully expect that they will never set this
attribute.  If the attribute is not set, then most browsers should assume
that the video may be of some significance, and ensure that the user can
play it.

I think the risk of authors accidentally setting "decorative" on critical
videos is small.  I also think that if a popular mobile browsing platform
were to respect this flag, major websites would use it correctly and user
experience would be improved.

> I think a more effective solution is to give
> a non-modal message to the user saying "This page is trying to play a
> video. Press the Foo key to play.", or similar.

Are you going to pop up a message of this kind for every  tag on
every page?  A page decorated with many small  tags in place of
animated GIFs is going to be quite difficult to use in a mobile browser
where each one is associated with a different approval dialog, and
approving causes them to take over the 4-inch screen.

--Ben

signature.asc
Description: OpenPGP digital signature

Re: [whatwg] A new attribute for and low-power devices


On Mon, 18 May 2009 18:59:01 +0200, Benjamin M. Schwartz 
 wrote:


Simon Pieters wrote:

If there is a controls attribute or if scripting is disabled, show
controls, else use author-provided scripted button (if any) to play the
video.


Consider a webpage in which a side-effect of clicking on some scripted
button is to trigger a small animation (using ) elsewhere on the
page.  If your browser is configured to show  full-screen, this
webpage will become nearly unusable, because the small animation will  
take

over the screen every time you click on a button.


I'm not convinced that this will be a problem in practice.



I am proposing an additional attribute for  so that the browser
will know not to do that.


I'm not convinced that an additional attribute would solve the problem: it is likely that 
some authors would use the attribute incorrectly, because it doesn't have any effect in 
their primary testing environment. If an author sets the attribute where it shouldn't be 
set, it effectively makes the video unavailable to users whose UA acts upon the 
attribute, which seems bad. I think a more effective solution is to give a non-modal 
message to the user saying "This page is trying to play a video. Press the Foo key 
to play.", or similar.

--
Simon Pieters
Opera Software

Re: [whatwg] DOMTokenList is unordered but yet requires sorting

2009-05-18 Thread Kristof Zelechovski

DOMTokenList, as an object, is semantically unordered, therefore an
arbitrary ordering can be used for enumeration.  The item method of
DOMTokenList provides an enumerator and imposes such an ordering.
Since no other enumerator is available to counter the claim, it may be
tempting to say, as a simplification, that DOMTokenList itself is ordered.
I would rather discourage you from putting it that way though because it
precludes inventing another enumerator in the future or as an extension.
HTH,
Chris

Re: [whatwg] A new attribute for and low-power devices

Simon Pieters wrote:
> If there is a controls attribute or if scripting is disabled, show
> controls, else use author-provided scripted button (if any) to play the
> video.

Consider a webpage in which a side-effect of clicking on some scripted
button is to trigger a small animation (using ) elsewhere on the
page.  If your browser is configured to show  full-screen, this
webpage will become nearly unusable, because the small animation will take
over the screen every time you click on a button.

I am proposing an additional attribute for  so that the browser
will know not to do that.

--Ben

signature.asc
Description: OpenPGP digital signature

Re: [whatwg] DOMTokenList is unordered but yet requires sorting

2009-05-18 Thread Erik Arvidsson

On Mon, May 18, 2009 at 00:18, Simon Pieters  wrote:
> Immagine if it is specified that the order is not relevant and
> implementations can use any order (so long as it's stable). So one UA uses
> one order and another uses another. Then one of those UAs becomes very
> popular. Web pages start to depend on the order of the popular UA (e.g. they
> use the first item and expect it to be the "right" one). Now those pages
> don't work in the less popular UA and that UA vendor has to reverse engineer
> the popular UA and implement the same order.
>
> The above has happened with the DOM Core .attributes attribute, IIRC.

It also happened to for in order for JS objects.

Simon, I think you have convinced me at least. I therefore think that
a better wording in the spec is to say that DOMTokenList acts as a
sorted set.

-- 
erik

Re: [whatwg] A new attribute for and low-power devices


On Mon, 18 May 2009 16:59:03 +0200, Benjamin M. Schwartz 
 wrote:


Simon Pieters wrote:

Is there a problem with always falling back to the poster image and just
play the video (full-screen or on-top) when the user indicates he wants
to see the video?


If every menu button has a  tag associated with it to show a  
little

3D animation, then (a) how do you indicate to the user that it is a video
without disrupting the page layout?,


You just show the poster image.



and (b) how do you allow the user to
request playback without interfering with the function of the button?


If there is a controls attribute or if scripting is disabled, show controls, 
else use author-provided scripted button (if any) to play the video. If the 
device has a way to show context menus, then you could start playback from the 
context menu. You could also have separate view that lists all media on the 
page, similar to Firefox's Media tab in View Page Info.

--
Simon Pieters
Opera Software

Re: [whatwg] A new attribute for and low-power devices

Simon Pieters wrote:
> Is there a problem with always falling back to the poster image and just
> play the video (full-screen or on-top) when the user indicates he wants
> to see the video?

If every menu button has a  tag associated with it to show a little
3D animation, then (a) how do you indicate to the user that it is a video
without disrupting the page layout?, and (b) how do you allow the user to
request playback without interfering with the function of the button?

--Ben

signature.asc
Description: OpenPGP digital signature

Re: [whatwg] Annotating structured data that HTML has no semantics for

On May 18, 2009, at 16:05, Eduard Pascual wrote:

On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen
wrote:

(If we were limited to reasoning about something that we don't have
experience with yet, I might believe that people can't be too inept
to use
prefix-based indirection. However, a decade of actual evidence
shows that

actual behavior defies reasoning here and prefix-based indirection is
something that both authors and implementors get wrong over and
over again.)

Curious: you refer to "a decade of actual evidence", but you fail to
refer to any actual evidence. I'm eager to see that evidence; could
you share it with us? Thank you.

I thought everyone had seen the confusion. There are pointers at
http://wiki.whatwg.org/wiki/Namespace_confusion
The wiki page is less than a decade old, so it's length isn't quite
that impressive.

I have been a Java programmer for some years, and
still find that convention absurd, horrible, and annoying. I'll
agree

that CURIEs are ugly, and maybe hard to understand, but reversed
domains are equally ugly and hard to understand.

Problems shared by CURIEs, URIs and reverse DNS names:
* Long.
* Identifiers outlive organization charts.

Ehm. CURIEs ain't really long: the main point of prefixes is to make
them as short as reasonably possible.

You need to consider the length of the prefix declarations, too.

Problems that reverse DNS names and URIs don't have but CURIEs have:
* Prefix-based indirection.

Indirection can't be taken as a problem when most currently used RDFa
tools don't use it at all (which proves that they can work without
relying on it).

What do you mean? Current RDFa tools don't use prefixes?

(I understand that if the microdata syntax offered no advantages
over

RDFa,
then it would be a wasted effort to diverge.

Which are the advantages it offers?

The syntax is simpler for the use cases it was designed for. It
uses a
simpler conceptual model (trees as opposed to graphs). It allows
short token
identifiers. It doesn't use prefix-based indirection. It doesn't
violate the

DOM Consistency Design Principle.

Ok, the syntax is simpler for a subset of the use cases; but it leaves
entirely out the rest of use cases.

What are the rest of the use cases? Why weren't they put forward when
Hixie asked for use cases?

The DOM Consistency again is not an advantage of the microdata syntax
because this could have been fulfilled with other syntaxes as well.

It's an advantage over RDFa-in-XHTML-served-as-text/html. It's not an
advantage over microformats or may not be an advantage over a
speculative yet undefined variation of RDFa.

It seems to me that it avoids much of what microformats advocates
find

objectionable

Could you specify, please? Do you mean anything else than WHATWG's
almost irrational hate toward CURIEs and everything that involves
prefixes?

RDFa uses a data model that is an overkill for the use cases.

Which use cases?

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-April/019374.html

No, it *can't* represent a full RDF model: it has already been shown
several times on this thread.

That's a feature.

What?? Being unable to deal with all the use cases is a feature??

Being simpler while addressing all the use cases is a feature.

Wait. Are you refering to microdata as an incremental improvement
over

RDFa?? IMO, it's rather a decremental enworsement.

That depends on the point of view. I'm sensing two major points of
view:

1) Graphs are more general than trees. Hence, being able to
serialize graphs

is better.

2) Graphs are more general than trees. Hence, graphs are harder to
design
UIs for, harder to traverse and harder for authors to grasp. Hence,
if trees

are enough to address use cases, we should only enable trees to be
serialized.

¬¬ Again, what's your basis to decide that "trees are enough to
address use cases"?? Of course, they are enough to solve some use
cases, but the convenience of dealing with just trees is not worth
sacrificing the needs of those use cases you are arbirarily deciding
to ignore.

I don't see anything on http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-April/019374.html
that doesn't boil down to trees or simple key-value pairs attached
to an item.

I subscribe to view #2, and it seems that trees are indeed enough
for the

use cases (that were stipulated by the pro-graph people!).

- Microdata can't represent the full RDF data model (while RDFa
can):

some complex structures are just not expressable with microdata.

That's not a use case. That's "theoretical purity".

It's not "theoretical purity", it's something simpler:
*extensibility*. And, with over two decades between versions of the
specs, this is a strong requirement: if a problem is noticed after
HTML5 becomes "the standard", it's essential to be able to solve it
without waiting 10 or 20 years for HTML6 to come out.

Well, you have to commit to some bounds on extensibility

Re: [whatwg] A new attribute for and low-power devices


On Mon, 18 May 2009 16:22:29 +0200, Benjamin M. Schwartz 
 wrote:

As I have mentioned earlier, there are some devices that will be unable  
to
render  faithfully inline, due to the limitations of hardware  
video

accelerators.  However, it occurs to me that there are two essentially
different uses for 

1. Important content for the webpage.  An example would be the central
video on a web page whose purpose is to allow users to view that video.
This is currently done principally using Adobe Flash and (to a lesser
extent)  tags.

2. Incidental animations.  Examples include decorative elements in a web
page's interface, animated sidebar advertisements, and other small page
elements of this kind.  This was historically a popular use for
animated-GIF, though Flash has largely overtaken it here as well.

In case 1, a browser on a low-powered device may show the video
"full-screen or in an independent resizable window" (to quote the spec).
The browser might also show the video at the specified size, but on top  
of

the page, rather than at its "correct" location in the middle of the
rendering stack.

However, for case 2, showing the video full-screen or moving it to the  
top

of the rendering stack would clearly be a bad idea, as the video does not
contain the content of interest to the user.  In this case, if browsers
cannot display the video as specified, they should probably fall back to
the "poster" image.

With the current tag definition, browsers will have to grow ugly
heuristics for this case, based on video's size, aspect ratio, "loop",  
and

"controls".  To avoid this heuristic hack, I suggest that  gain an
additional attribute to indicate which behavior is preferable.  A boolean
attribute like "decorative", "incidental", or "significant" would greatly
assist browsers in determining the correct behavior.


Is there a problem with always falling back to the poster image and just play 
the video (full-screen or on-top) when the user indicates he wants to see the 
video?

--
Simon Pieters
Opera Software

[whatwg] A new attribute for and low-power devices

As I have mentioned earlier, there are some devices that will be unable to
render  faithfully inline, due to the limitations of hardware video
accelerators.  However, it occurs to me that there are two essentially
different uses for 

1. Important content for the webpage.  An example would be the central
video on a web page whose purpose is to allow users to view that video.
This is currently done principally using Adobe Flash and (to a lesser
extent)  tags.

2. Incidental animations.  Examples include decorative elements in a web
page's interface, animated sidebar advertisements, and other small page
elements of this kind.  This was historically a popular use for
animated-GIF, though Flash has largely overtaken it here as well.

In case 1, a browser on a low-powered device may show the video
"full-screen or in an independent resizable window" (to quote the spec).
The browser might also show the video at the specified size, but on top of
the page, rather than at its "correct" location in the middle of the
rendering stack.

However, for case 2, showing the video full-screen or moving it to the top
of the rendering stack would clearly be a bad idea, as the video does not
contain the content of interest to the user.  In this case, if browsers
cannot display the video as specified, they should probably fall back to
the "poster" image.

With the current tag definition, browsers will have to grow ugly
heuristics for this case, based on video's size, aspect ratio, "loop", and
"controls".  To avoid this heuristic hack, I suggest that  gain an
additional attribute to indicate which behavior is preferable.  A boolean
attribute like "decorative", "incidental", or "significant" would greatly
assist browsers in determining the correct behavior.

--Ben



signature.asc
Description: OpenPGP digital signature

Re: [whatwg] External document subset support

2009-05-18 Thread Kristof Zelechovski

Using entities in XSL to share code was my mistake once too; it is similar
to using data members not wrapped in properties in data types.  XSL itself
provides a better structured approach for code reuse.

Being able to use localized programming language constructs is at the same
time trivial (replace this with that), expensive (you have to translate the
documentation) and not that useful (you freeze the language and cut the
programmers off from the recent developments in the language).  Languages
tend to use English keywords regardless of the culture of their designer
because:

1.   no matter how deep you go, there is always a place where you have
to switch to English in order to refer to some precedent technology,

2.   the English words/roots used in the language design often have a
slightly different meaning from the English source,

3.   they are sufficiently few to be learned easily; it may be harder to
grasp what they actually mean in the particular context.

(Toy languages for children make an exception, of course; however, even
children tend to mock them nowadays.)

Best regards,

Chris

Re: [whatwg] Annotating structured data that HTML has no semanticsfor

2009-05-18 Thread Kristof Zelechovski

Being unable to deal with all use cases sometimes is a feature.  For
example, regular expressions are unable to recognize all recursive
languages; it is a feature.  As a compensation for that loss, they do not
suffer from the halting problem.

HTH,
Chris

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Maciej Stachowiak



On May 18, 2009, at 6:05 AM, Eduard Pascual wrote:

On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen   
wrote:

On May 14, 2009, at 23:52, Eduard Pascual wrote:

On Thu, May 14, 2009 at 3:54 PM, Philip Taylor >

wrote:
It doesn't matter one syntax or another. But if a syntax already
exists (RDFa), building a new syntax should be properly justified.


It was at the start of this thread:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html

Ian's initial message goes step by step through the creation of this
new syntax; but does *not* mention at all *why* it was being created
on the first place. The insight into the choices taken is indeed a
good think, and I thank Ian for it; but he omitted to provide insight
into the first choice taken: discarding the multiple options already
available (not only Microformats and RDFa, but also other less
discussed ones such as eRDF, EASE, etc).



I think Ian did explain why he discarded RDFa as an option.

In the email linked above, Ian Hickson wrote:

Another solution we could consider is RDFa:

 http://damowmow.com/";>
  Hedral
  Hedral is a male american domestic  
shorthair,

  with a fluffy black fur with white paws and belly.
  
 

This unfortunately also has a number of problems.

 - it uses prefixes, which most authors simply do not understand, and
   which many implementors end up getting wrong (e.g. SearchMonkey
   hard-coded certain prefixes in its first implementation, Google's
   handling of RDF blocks for license declarations is all done with
   regular expressions instead of actually parsing the namespaces,  
etc).

   Even if implemented right, namespaces still lead to flaky
   copy-and-paste behaviour.

 - it sometimes uses rel="" and sometimes uses property="" and it's  
hard

   to know when to use one or the other.

 - it introduces much more power than is necessary to solve this  
problem.



I believe Microformats were discarded as a solution because the  
proposed use case was as follows:


USE CASE: Annotate structured data that HTML has no semantics for,  
and which nobody has annotated before, and may never again, for  
private use or use in a small self-contained community.


But Microformats are only intended for widely used and generally  
agreed upon public vocabularies. The Microformats process is not  
applicable to private-use/small-community vocabularies. And  
Microformats define specific vocabularies, not a general way to add  
new kinds of semantic markup. I expect Microformats experts would  
agree with this assessment.



So I think it is clear why neither Microformats or RDFa were seen as  
suitable solutions to the use case, even if the matter was addressed  
somewhat briefly.



Regards,
Maciej

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Eduard Pascual

On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen  wrote:
> On May 14, 2009, at 23:52, Eduard Pascual wrote:
>
>> On Thu, May 14, 2009 at 3:54 PM, Philip Taylor 
>> wrote:
>> It doesn't matter one syntax or another. But if a syntax already
>> exists (RDFa), building a new syntax should be properly justified.
>
> It was at the start of this thread:
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html
Ian's initial message goes step by step through the creation of this
new syntax; but does *not* mention at all *why* it was being created
on the first place. The insight into the choices taken is indeed a
good think, and I thank Ian for it; but he omitted to provide insight
into the first choice taken: discarding the multiple options already
available (not only Microformats and RDFa, but also other less
discussed ones such as eRDF, EASE, etc). Sure, there has been a lot of
discussion on this topic; and it's possible that the choice was taken
as part of such discussions. In any case, I think Ian should have
clearly stated the reasons to build a brand new solution when many
others have been out for a while and users have been able to try and
test them.
Please keep in mind that I'm not critizicing the choice itself (at
least, not now), but the lack of information and reasoning behind that
choice.
>
>> As
>> of now, the only supposed benefit I have heard of for this syntax is
>> that it avoids CURIEs... yet it replaces them with reversed domains??
>> Is that a benefit?
>
> There's no indirection. A decade of Namespaces in XML shows that both
> authors and implementors have trouble getting prefix-based indirection
> right.
Really? I haven't seen any hint about that. Sure, there will be some
people who have trouble understanding namespaces, just like there is
some people who have trouble understanding why something like
"foobar" is wrong.
Please, could you quote a source for that claim? I could also claim
something like "fifteen years of Java show that reversed domains are
error-prone and harmful", and even argue about it; but this kind of
arguments, without a serious analisis or study to back them, are
completely meaningless and definitely subjective.
>
> (If we were limited to reasoning about something that we don't have
> experience with yet, I might believe that people can't be too inept to use
> prefix-based indirection. However, a decade of actual evidence shows that
> actual behavior defies reasoning here and prefix-based indirection is
> something that both authors and implementors get wrong over and over again.)
Curious: you refer to "a decade of actual evidence", but you fail to
refer to any actual evidence. I'm eager to see that evidence; could
you share it with us? Thank you.
>
>> I have been a Java programmer for some years, and
>> still find that convention absurd, horrible, and annoying. I'll agree
>> that CURIEs are ugly, and maybe hard to understand, but reversed
>> domains are equally ugly and hard to understand.
>
> Problems shared by CURIEs, URIs and reverse DNS names:
>  * Long.
>  * Identifiers outlive organization charts.
Ehm. CURIEs ain't really long: the main point of prefixes is to make
them as short as reasonably possible.
Good identifiers outlive bad organization charts. Good organization
outlives bad identifiers. Good organization and good identifier tend
to outlive the context they are used in.
>
> Problems that reverse DNS names don't have but CURIEs and URIs do have:
>  * "http://"; 7 characters of even extra length.
>  * Affordance of dereferencability when mere identifier sementics are meant.
A CURIE (at least as typed by an author) doesn't have the "http://":
it is a prefix, a colon, and whatever goes after it. Once resolved
(ie: after replacing the prefix and colon by what the prefix
represents) what you get is no longer a CURIE, but a URI like the ones
you'd type in your browser or inside a link's href attribute.
Derefercability is not a problem on itself: having more than what is
strictly needed can be either irrelevant or an advantage, not a
problem. Of course, it *may* be the cause of some actual problem, but
in that case you should rather describe the problem itself, so it can
be evaluated.
>
> Problems that reverse DNS names and URIs don't have but CURIEs have:
>  * Prefix-based indirection.
Indirection can't be taken as a problem when most currently used RDFa
tools don't use it at all (which proves that they can work without
relying on it). Sure, it's not as big an advantage as some may claim
it to be. But the ability of indirection itself, even if not 100%
guaranteed to work, it is an actual advantage. As a real world
example, I have been able to learn about vocabularies I didn't know by
following the "links" on prefix declarations in documents using them.
>  * Violation of the DOM Consistency Design Principle if xmlns:foo used.
*if* xmlns:foo is used. Very strong emphasis on the conditional, and
on the multiple possibilities that have already been propo

[whatwg] DOM3 Load and Save for simple parsing/serialization?


One more thought...

While it is great that innerHTML is being officially standardized, I'm 
afraid it would be rather hackish to have to use it for parsing and 
serializing dynamically created content which wasn't destined to make it 
immediately into the document, if at all.


Has any thought been given to standardizing on at least a part of DOM 
Level 3 Load and Save in HTML5?


The API, if simply applied to serialization, would look like this :

var ser = DOMImplementationLS.createLSSerializer();
var str = ser.writeToString(document);

and like this for parsing to the DOM:

var lsParser = DOMImplementationLS.createLSParser(1, null); // 1 
for synchronous; null for no schema type

var lsInput = DOMImplementationLS.createLSInput();
lsInput.stringData = '';
var doc = lsParser.parse(lsInput);

If a revision to the DOM3 module is not in order (which, e.g., 
simplifies the parsing from a string for simple cases) and the above is 
considered too cumbersome, maybe some other cross-browser standard could 
be agreed upon?


I think using DOM3 would facilitate readily adding additional aspects of 
the module in the future (as ECMAScript seems to be positively albeit 
slowly expanding to ever new uses) and offer familiarity for those 
working in other contexts with DOM Level 3, while ECMAScript users can 
still wrap these in their own simpler functions. However, I can also see 
the desire for something simpler (as I say, maybe an addendum to the L&S 
module). But I do hope something might be considered, since I find this 
to be a quite frequent need and do not like relying on feature-checking 
for non-standard methods in the various browsers as well as being 
unclear on how to future-proof my code to work with standards-compliant 
browsers...


thanks,
Brett

Re: [whatwg] Link rot is not dangerous


On May 18, 2009, at 14:45, Dan Brickley wrote:


On 18/5/09 10:34, Henri Sivonen wrote:
It seems to me that the positions that RDF applications should  
"Follow

Their Nose" and that link rot is not dangerous (to RDF) are
contradictory positions.


That's a strong claim. There is certainly a balance to be found  
between taking advantage of de-referencable URIs and relying on  
their de-referencability. De-referencing is a privilege not a right,  
after all.


If there's value in apps dereferencing namespace URIs, those URIs  
going undereferencable leads to loss of value. Hence, link rot would  
cause loss of value i.e. be 'dangerous' by breaking something.


If I lost control of xmlns.com tommorrow, and it became un-rescuably  
owned by offshore spam-virus-malware pirates, that doesn't change  
history. For nine years, the FOAF documentation has lived there, and  
we can use URIs to ask other services about what they saw during  
that period: http://web.archive.org/web/*/http://xmlns.com/foaf/0.1/


Do any RDF consumer apps that dereference namespace URIs actually fall  
back on web.archive.org?


If I'm a FOAF author, what recourse do I have if URI dereferencing- 
based functionality breaks in some apps due to xmlns.com going  
unavailable when other apps have hard-coded xmlns.com URIs so if I  
simply changed my predicates I'd break existing apps? At least authors  
who rely on Y!/AOL/Google serving JS libraries can start using a copy  
of any JS library on another CDN without changing how the script runs.


Since there is useful information to know about FOAF properties and  
terms from its schema and human-oriented docs, it would be a shame  
if people ignored that. Since domain names can be lost, it would  
also be a shame if directly de-referencing URIs to the schema was  
the only way people could find that info. Fortunately, neither is  
the case.


I wasn't talking about people but about apps dereferencing NS URIs to  
enable their functionality.



That link rot hasn't been a practical problem to the Semantic Web
community suggests that applications don't really Follow Their Nose  
in

practice. Can anyone point me to a deployed end user application that
uses RDF internally and Follows Its Nose?


The search site, sindice.com does this:


Thanks.


Whether you consider sindice.com end-user facing or not, I don't know.


I wouldn't characterize it as an end-user app. It exposes terms like  
"RDF" and "triples" and shows qnames to the user.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Julian Reschke


Henri Sivonen wrote:

The interesting question here is whether there's a better system.


 1) Centralized allocation of short names.


Sounds like "urn:" to me. Registry is defined in RFC 3406.

 2) Prefixing a short name by (an abbreviation of) the name of the 
vocabulary, which makes the probability of collision negligible once the 
designer has googled to check the probable absence of public collisions 
at minting time (e.g. "openid.delegate").


Too fragile for disambiguation for my taste.


That depends on the choice of the URI scheme.


I guess one could use e.g. "data:,foo" URIs as a namespace URI, but why 
not just use "foo"?



URI give you the choice of having something easily referenceable (if you 
want), or not.



Problems that reverse DNS names and URIs don't have but CURIEs have:
* Prefix-based indirection.


HTML developers regularly have to deal with a much more complicated 
indirection mechanism (CSS).


This would be a persuasive argument if we were reasoning about a feature 
we don't have experience with yet. However, experience shows 
prefix-based indirection is too hard. If at the same time CSS isn't too 
hard, I just have to accept the evidence from the real world even if it 
defies reasoning.


No, I don't think we have evidence that prefix-based indirection is too 
hard. There are way to many people getting it right.



...
Either @prefix or RDFa-profiles would break the network effects of the 
deployment of outside-of-REC RDFa-in-XHTML-as-text/html, so if breaking 
network effects is on the table in the form of @prefix and 
RDFa-profiles, I don't see why microdata wouldn't be on the table as far 
as network effects go.


Introducing @prefix will be much simpler to deploy than introducing a 
completely different system.


That being said, I do agree that the current situation is a mess, and 
that the RDFa-in-XHTML spec has created it.


Given the current situation, the simplest possible solution probably is 
to live with it, and use xmlns declarations in HTML for the purpose of 
RDFa as well.


BR, Julian

Re: [whatwg] Link rot is not dangerous

2009-05-18 Thread Dan Brickley


On 18/5/09 10:34, Henri Sivonen wrote:

On May 15, 2009, at 19:20, Manu Sporny wrote:


There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that there
must be a single vocabulary document in existence, for all time, at one
location.


The "flawed" conclusion flows out of "Follow Your Nose" advocacy, and is
not flawed if one takes "Follow Your Nose" seriously.

It seems to me that the positions that RDF applications should "Follow
Their Nose" and that link rot is not dangerous (to RDF) are
contradictory positions.


That's a strong claim. There is certainly a balance to be found between 
taking advantage of de-referencable URIs and relying on their 
de-referencability. De-referencing is a privilege not a right, after all.


If I lost control of xmlns.com tommorrow, and it became un-rescuably 
owned by offshore spam-virus-malware pirates, that doesn't change 
history. For nine years, the FOAF documentation has lived there, and we 
can use URIs to ask other services about what they saw during that 
period: http://web.archive.org/web/*/http://xmlns.com/foaf/0.1/


Since there is useful information to know about FOAF properties and 
terms from its schema and human-oriented docs, it would be a shame if 
people ignored that. Since domain names can be lost, it would also be a 
shame if directly de-referencing URIs to the schema was the only way 
people could find that info. Fortunately, neither is the case.



That link rot hasn't been a practical problem to the Semantic Web
community suggests that applications don't really Follow Their Nose in
practice. Can anyone point me to a deployed end user application that
uses RDF internally and Follows Its Nose?


The search site, sindice.com does this:

"Yes Sindice dereferences URIs it finds in RDF instance data, including 
class and property URIs. It performs OWL reasoning using the retrieved 
information, mostly to infer additional triples based on subclass and 
subproperty relationships. Doing this helps us to increase recall in 
queries." (from Richard Cyganiak, who I asked offlist for confirmation)


Whether you consider sindice.com end-user facing or not, I don't know. I 
put in roughly the same category as Google's Social Graph API. But it's 
a non-trivial implementation that aggregates and integrates a lot of data.


BTW here's another use case for identifying properties and classes by 
URI: we can decentralise the translation of their labels into other 
languages. Here are some Korean descriptions of FOAF, for example: 
http://svn.foaf-project.org/foaftown/foaf18n/foaf-kr.rdf


cheers,

Dan

Re: [whatwg] Annotating structured data that HTML has no semantics for


On May 18, 2009, at 12:18, Julian Reschke wrote:


Henri Sivonen wrote:
There's no indirection. A decade of Namespaces in XML shows that  
both authors and implementors have trouble getting prefix-based  
indirection right.


It's true that people get this wrong again and again. But it's also  
true that lots of developers understand it once for all, and then  
consistently get it right.


The interesting question here is whether there's a better system.


 1) Centralized allocation of short names.
 2) Prefixing a short name by (an abbreviation of) the name of the  
vocabulary, which makes the probability of collision negligible once  
the designer has googled to check the probable absence of public  
collisions at minting time (e.g. "openid.delegate").



I have been a Java programmer for some years, and
still find that convention absurd, horrible, and annoying. I'll  
agree

that CURIEs are ugly, and maybe hard to understand, but reversed
domains are equally ugly and hard to understand.

Problems shared by CURIEs, URIs and reverse DNS names:
* Long.
* Identifiers outlive organization charts.


That depends on the choice of the URI scheme.


I guess one could use e.g. "data:,foo" URIs as a namespace URI, but  
why not just use "foo"?



Problems that reverse DNS names and URIs don't have but CURIEs have:
* Prefix-based indirection.


HTML developers regularly have to deal with a much more complicated  
indirection mechanism (CSS).


This would be a persuasive argument if we were reasoning about a  
feature we don't have experience with yet. However, experience shows  
prefix-based indirection is too hard. If at the same time CSS isn't  
too hard, I just have to accept the evidence from the real world even  
if it defies reasoning.


The syntax is simpler for the use cases it was designed for. It  
uses a simpler conceptual model (trees as opposed to graphs). It  
allows short token identifiers. It doesn't use prefix-based  
indirection. It doesn't violate the DOM Consistency Design Principle.


(devil's advocate argument) - so how does the syntax behave for  
those use cases it *hasn't* been designed for?


That's hard to test, because the use case search has been exhausted  
for the moment. It seems we'd need to wait to see new use cases to pop  
up.



RDFa uses a data model that is an overkill for the use cases.


It would be interesting to understand which use cases that RDFa can  
do are not supported by "microdata" (I don't understand enough about  
the subject to try myself), and whether the potential advantage of  
having a simpler model outweighs the disadvantage of not using  
network effects and creating a competing syntax.


Are there use cases of RDFa that are currently known but that the call  
for use cases didn't turn up?


Either @prefix or RDFa-profiles would break the network effects of the  
deployment of outside-of-REC RDFa-in-XHTML-as-text/html, so if  
breaking network effects is on the table in the form of @prefix and  
RDFa-profiles, I don't see why microdata wouldn't be on the table as  
far as network effects go.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] External document subset support


On May 18, 2009, at 11:50, Brett Zamir wrote:


Henri Sivonen wrote:

On May 18, 2009, at 09:36, Brett Zamir wrote:

Section 10.1, "Writing XHTML documents" observes: "According to  
the XML specification, XML processors are not guaranteed to  
process the external DTD subset referenced in the DOCTYPE."


While this is true, since no doubt the majority of web browsers  
are already able to process external stylesheets or scripts, might  
the very useful feature of external entity files, be employed by  
XHTML 5 as a stricter subset of XML (similar to how XML Namespaces  
re-annexed the colon character) in order to allow this useful  
feature to work for XHTML (to have access to HTML entities or  
other useful entities for one, as well as enable a poor man's  
localization, etc.)?


See http://hsivonen.iki.fi/no-dtd/ explains why DTDs don't work for  
the Web in the general case.


While that is a thoughtful and helpful article, your arguments there  
mostly relate to validation from a central spec.


No, my arguments don't relate to validation but to having to  
dereference a URI that isn't under the author's control and that gets  
copied around as boilerplate.


Also, as far as heavy server loads for frequent DTDs, entities could  
be deliberately not defined at a resolvable URL.


There are existing XML doctypes out there with resolvable URIs, so  
you'd need a blacklist to bootstrap such a solution.


The same problems of denial-of-service could exist with stylesheet  
requests, script requests, etc.


No, styles and scripts are commonly site-specific, so there isn't a  
Web-wide single point of failure whose URI gets copied around as  
boilerplate.


Even some sites, like Yahoo, have encouraged referring to their  
frequently accessed external files to take advantage of caching.


At least the serving infrastructure for those URIs has been designed  
for high load unlike the server for many existing DTD URIs out there.  
Furthermore, JS libraries have obvious functionality in existing  
browsers, so it's unlikely that authors would reference JS libraries  
as part of boilerplate without actually intending to take the perf hit  
of loading the library.


The spec could even insist on same-domain, though I don't see any  
need for that.


Without same-origin (as in not even performing a CORS GET), you'd need  
to blacklist at least w3.org due to existing references out there.  
(Note that for security, same-origin/CORS is must-have anyway.)


I also disagree with throwing our hands up in the air about  
character entities (or thinking that the (English-based) HTML ones  
are sufficient).


That's a text input method issue that needs to be solved on the  
authoring side for text input of all kind--not just text input for  
writing XML in a text editor.


Moreover, the browser with the largest market share offers such  
support already, and those who depend on it may already view other  
browsers not supporting the standard as "broken".


IE doesn't support XHTML or SVG which are the popular XML formats one  
might want to load into a browsing context.


Loading same-origin DTDs for the purpose of localization is a semi- 
defensible case, but it's a lot of complexity for a use case that  
is way on the wrong side of 80/20 on the Web scale.

How so?


Localized sites are a minority on the Web, and chances that localized  
Web apps would switch to a client-side localization method that relies  
on server-side negotiation of the localization and requires XML to  
work seem dim.


Even if it is a niche group which uses TEI, Docbook, etc. or who  
wants to be able to build say a browser extension which can take  
advantage of their rich semantics, this is still a use for citizens  
of the web.


If you need a browser extension for content, you shut out users of  
browsers that don't have the particular extension available. It's like  
using Flash.


If people can push forward with backwards-incompatible technologies  
like the video element, 3d-animation, or whatever, it seems not much  
to ask to support the humble external entity file... :)


The upside of video and 3D is much more significant than the upside of  
supporting external DTDs.


Besides, if the use case for DTDs is localization within an origin,  
the server can perform the XML parse and reserialize into DTDless  
XML. (That's how I've implemented this pattern in the past without  
client-side support.)


That is assuming people are aware of scripting and have access to  
such resources.


Localization with DTDs but without scripting is already tricky, since  
one would need to tweak conneg. Furthermore, localization with DTDs  
makes more sense for Web app UIs than static content, and Web apps  
typically have server-side program code anyway.


Wasn't it one of the aims of the likes of XSL, XQuery, and XForms to  
use a syntax which doesn't require knowledge of an unrelated  
scripting language (and those are pretty complex examples unlike  
e

[whatwg] File package protocol and manifest support?

While this may be too far in the game to bring up, I'd very much be 
interested (and think others would be too) to have a standard means of 
representing not only individual files, but also groups of files on the web.


One application of this would be for a web user to be able to do the 
following (taking advantage of both offline applications and related 
somewhat to custom protocols):


1) Click a link in a particular protocol containing a list of files or 
leading to a manifest file which contains a list of files. Very 
importantly, the files would NOT need to be from the same site.
2) If the files have not been downloaded already, the browser accesses 
the files (possibly first decompressing them) to store for offline use.
3) If the files were XML/XHTML, take advantage of any attached XSL, 
XQuery, or CSS in reassembling them.
4) If the files were SQL, reassemble them in a table-agnostic 
manner--e.g., allow the user to choose which columns to view and in 
which order and how many records at a time (including allowing a 
single-record "flashcard"-like view), also allowing for automated 
generation of certain columns using JavaScript.
5) If the files included templates, use these for the display and 
populate for the user to view.
6) Bring the user to a particular view of the pages, starting for 
example, at a particular paragraph indicated by the link or manifest 
file, highlight the document or a portion of the targeted page with a 
certain font and color, etc.


It seems limiting that while we can reference individual sites' data at 
best targeting an existing anchor or predefined customizability, we do 
not have any built-in way to bookmark and share views of that data over 
the web.


In considering building a Firefox extension to try this as a proof of 
concept, METS (http://www.loc.gov/standards/mets/ ) seems to have many 
aspects which could be useful as a base in such a standard, including 
the useful potential of enabling links to be described for files which 
may not exist as hyperlinks within the files--i.e., XLink linkbases).


Besides this offline packages use, such a language might work just as 
well to build a standard for hierarchical sitemaps, linkbases, or Gopher 
2.0 (and not being limited to its usual web view, equivalent of "icon 
view" on the desktop, but conceivably allowing "column browser" or tree 
views for hierarchical data ranging from interlinked genealogies to 
directories along the lines of http://www.dmoz.org/ or 
http://dir.yahoo.com ), including for representing files on one's own 
local system yet leading to other sites. The same manifest files might 
be browseable directly (e.g., Gopher-mode), being targeted to 
continguously lead to other such manifest file views until reaching a 
document (the Gopher-view could optionally remain in sight as the end 
document loaded), or, as mentioned above, as a cached and integrated 
offline application (especially where compressed files and SQL were 
involved).


Brett

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Julian Reschke


Henri Sivonen wrote:
There's no indirection. A decade of Namespaces in XML shows that both 
authors and implementors have trouble getting prefix-based indirection 
right.


It's true that people get this wrong again and again. But it's also true 
that lots of developers understand it once for all, and then 
consistently get it right.


The interesting question here is whether there's a better system.


I have been a Java programmer for some years, and
still find that convention absurd, horrible, and annoying. I'll agree
that CURIEs are ugly, and maybe hard to understand, but reversed
domains are equally ugly and hard to understand.


Problems shared by CURIEs, URIs and reverse DNS names:
 * Long.
 * Identifiers outlive organization charts.


That depends on the choice of the URI scheme.


Problems that reverse DNS names don't have but CURIEs and URIs do have:
 * "http://"; 7 characters of even extra length.
 * Affordance of dereferencability when mere identifier sementics are 
meant.


Again, that depends on the URI scheme.


Problems that reverse DNS names and URIs don't have but CURIEs have:
 * Prefix-based indirection.


HTML developers regularly have to deal with a much more complicated 
indirection mechanism (CSS).



 * Violation of the DOM Consistency Design Principle if xmlns:foo used.


I think there is consensus that this is a drawback, but not about how 
significant this is.


The syntax is simpler for the use cases it was designed for. It uses a 
simpler conceptual model (trees as opposed to graphs). It allows short 
token identifiers. It doesn't use prefix-based indirection. It doesn't 
violate the DOM Consistency Design Principle.


(devil's advocate argument) - so how does the syntax behave for those 
use cases it *hasn't* been designed for?


Compared to microformats, microdata defines the processing model and 
conformance criteria. The microformats community has failed to provide 
processing model and conformance criteria on similar level of detail. 


Indeed.

The processing model side is perceived to be such a serious issue that 
the lack of a unified microformats parsing spec is cited as a motivation 
to use RDFa instead of microformats.


Indeed.


RDFa uses a data model that is an overkill for the use cases.


It would be interesting to understand which use cases that RDFa can do 
are not supported by "microdata" (I don't understand enough about the 
subject to try myself), and whether the potential advantage of having a 
simpler model outweighs the disadvantage of not using network effects 
and creating a competing syntax.



...


BR, Julian

Re: [whatwg] External document subset support


Henri Sivonen wrote:

On May 18, 2009, at 09:36, Brett Zamir wrote:

Section 10.1, "Writing XHTML documents" observes: "According to the 
XML specification, XML processors are not guaranteed to process the 
external DTD subset referenced in the DOCTYPE."


While this is true, since no doubt the majority of web browsers are 
already able to process external stylesheets or scripts, might the 
very useful feature of external entity files, be employed by XHTML 5 
as a stricter subset of XML (similar to how XML Namespaces re-annexed 
the colon character) in order to allow this useful feature to work 
for XHTML (to have access to HTML entities or other useful entities 
for one, as well as enable a poor man's localization, etc.)?


See http://hsivonen.iki.fi/no-dtd/ explains why DTDs don't work for 
the Web in the general case.


While that is a thoughtful and helpful article, your arguments there 
mostly relate to validation from a central spec. Also, as far as heavy 
server loads for frequent DTDs, entities could be deliberately not 
defined at a resolvable URL. The same problems of denial-of-service 
could exist with stylesheet requests, script requests, etc. Even some 
sites, like Yahoo, have encouraged referring to their frequently 
accessed external files to take advantage of caching. The spec could 
even insist on same-domain, though I don't see any need for that. If I 
give my website out to Slashdot, I shouldn't be surprised when I get 
"slashdotted", and if I do, that's my fault, not the web's fault. A DTD 
doesn't need to reference a central location, nor would it be likely 
that major browsers would fail to use the PUBLIC identifier to avoid 
checking for the SYSTEM file.


I also disagree with throwing our hands up in the air about character 
entities (or thinking that the (English-based) HTML ones are 
sufficient). As I said, just because the original spec defined it as 
optional, does not mean we must perpetually remain stuck in the past, 
especially in the case of XML-on-the-web which is not going to break a 
whole lot of browsing uses at all if external DTDs are suddently made 
possible. Moreover, the browser with the largest market share offers 
such support already, and those who depend on it may already view other 
browsers not supporting the standard as "broken".
Loading same-origin DTDs for the purpose of localization is a 
semi-defensible case, but it's a lot of complexity for a use case that 
is way on the wrong side of 80/20 on the Web scale. 
How so? And besides localization, there are many other uses such as 
providing a convenient tool for editors to avoid finding a copyright 
symbol, etc. Not everyone uses an IDE which makes these available or 
knows how to use it. I'm assisting such a project which has this issue. 
And I really don't buy the web/non-web dichotomy which some people make. 
If there's an offline use, there's an online use, pure and simple. And a 
client-side-only use as well--to be able to read my own documents, I'd 
like to do so in a browser--many others besides me like to "live in" 
their browsers.


Even if it is a niche group which uses TEI, Docbook, etc. or who wants 
to be able to build say a browser extension which can take advantage of 
their rich semantics, this is still a use for citizens of the web. If 
people can push forward with backwards-incompatible technologies like 
the video element, 3d-animation, or whatever, it seems not much to ask 
to support the humble external entity file... :)
Besides, if the use case for DTDs is localization within an origin, 
the server can perform the XML parse and reserialize into DTDless XML. 
(That's how I've implemented this pattern in the past without 
client-side support.)


That is assuming people are aware of scripting and have access to such 
resources. Wasn't it one of the aims of the likes of XSL, XQuery, and 
XForms to use a syntax which doesn't require knowledge of an unrelated 
scripting language (and those are pretty complex examples unlike entities)?


(Btw, you and I discussed this before, though I didn't get a response 
from you to my last post: 
https://bugzilla.mozilla.org/show_bug.cgi?id=22942#c109 ; I don't mean 
to go off-topic but you might wish to consider or respond to some of its 
points as well...)


best wishes,
Brett

Re: [whatwg] Annotating structured data that HTML has no semantics for


On May 14, 2009, at 23:52, Eduard Pascual wrote:

On Thu, May 14, 2009 at 3:54 PM, Philip Taylor > wrote:

It doesn't matter one syntax or another. But if a syntax already
exists (RDFa), building a new syntax should be properly justified.


It was at the start of this thread:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html


As
of now, the only supposed benefit I have heard of for this syntax is
that it avoids CURIEs... yet it replaces them with reversed domains??
Is that a benefit?


There's no indirection. A decade of Namespaces in XML shows that both  
authors and implementors have trouble getting prefix-based indirection  
right.


(If we were limited to reasoning about something that we don't have  
experience with yet, I might believe that people can't be too inept to  
use prefix-based indirection. However, a decade of actual evidence  
shows that actual behavior defies reasoning here and prefix-based  
indirection is something that both authors and implementors get wrong  
over and over again.)



I have been a Java programmer for some years, and
still find that convention absurd, horrible, and annoying. I'll agree
that CURIEs are ugly, and maybe hard to understand, but reversed
domains are equally ugly and hard to understand.


Problems shared by CURIEs, URIs and reverse DNS names:
 * Long.
 * Identifiers outlive organization charts.

Problems that reverse DNS names don't have but CURIEs and URIs do have:
 * "http://"; 7 characters of even extra length.
 * Affordance of dereferencability when mere identifier sementics are  
meant.


Problems that reverse DNS names and URIs don't have but CURIEs have:
 * Prefix-based indirection.
 * Violation of the DOM Consistency Design Principle if xmlns:foo used.

(I understand that if the microdata syntax offered no advantages  
over RDFa,

then it would be a wasted effort to diverge.

Which are the advantages it offers?


The syntax is simpler for the use cases it was designed for. It uses a  
simpler conceptual model (trees as opposed to graphs). It allows short  
token identifiers. It doesn't use prefix-based indirection. It doesn't  
violate the DOM Consistency Design Principle.


On May 15, 2009, at 14:11, Eduard Pascual wrote:

On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak   
wrote:

[...]
From my cursory study, I think microdata could subsume many of the  
use cases

of both microformats and RDFa.

Maybe. But microformats and RDFa can handle *all* of these cases.
Again, which are the benefits of creating something entirely new to
replace what already exists while it can't even handle all the cases
of what it is replacing?


Compared to microformats, microdata defines the processing model and  
conformance criteria. The microformats community has failed to provide  
processing model and conformance criteria on similar level of detail.  
The processing model side is perceived to be such a serious issue that  
the lack of a unified microformats parsing spec is cited as a  
motivation to use RDFa instead of microformats.


It seems to me that it avoids much of what microformats advocates  
find objectionable

Could you specify, please? Do you mean anything else than WHATWG's
almost irrational hate toward CURIEs and everything that involves
prefixes?


RDFa uses a data model that is an overkill for the use cases.


but at the same time it seems it can represent a full RDF data
model.

No, it *can't* represent a full RDF model: it has already been shown
several times on this thread.


That's a feature.


Wait. Are you refering to microdata as an incremental improvement over
RDFa?? IMO, it's rather a decremental enworsement.


That depends on the point of view. I'm sensing two major points of view:

1) Graphs are more general than trees. Hence, being able to serialize  
graphs is better.


2) Graphs are more general than trees. Hence, graphs are harder to  
design UIs for, harder to traverse and harder for authors to grasp.  
Hence, if trees are enough to address use cases, we should only enable  
trees to be serialized.


I subscribe to view #2, and it seems that trees are indeed enough for  
the use cases (that were stipulated by the pro-graph people!).



- Microdata can't represent the full RDF data model (while RDFa can):
some complex structures are just not expressable with microdata.


That's not a use case. That's "theoretical purity".


- Microdata relies on reversed domains. While some people argue these
to be better than CURIEs, they are equally horrendous for the average
user, and have the additional disadvantage that they don't map to
anything useful (if they map to something at all), while CURIEs map to
the descriptions and/or definitions of what they represent.


I consider it an advantage that reverse domains don't suggest that you  
should try dereferencing identifiers as if they were addresses.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Link rot is not dangerous

On May 15, 2009, at 19:20, Manu Sporny wrote:

There have been a number of people now that have gone to great lengths
to outline how awful link rot is for CURIEs and the semantic web in
general. This is a flawed conclusion, based on the assumption that
there
must be a single vocabulary document in existence, for all time, at
one

location.

The "flawed" conclusion flows out of "Follow Your Nose" advocacy, and
is not flawed if one takes "Follow Your Nose" seriously.

It seems to me that the positions that RDF applications should "Follow
Their Nose" and that link rot is not dangerous (to RDF) are
contradictory positions.

That link rot hasn't been a practical problem to the Semantic Web
community suggests that applications don't really Follow Their Nose in
practice. Can anyone point me to a deployed end user application that
uses RDF internally and Follows Its Nose?

(For clarity: I'm not saying that link rot is dangerous to RDF apps.
I'm saying that taking the position that it is not dangerous
contradicts Follow Your Nose advocacy. I think "Follow Your Nose" is
impractical on the Web scale and is alien to naming schemes used in
technologies that have been successfully deployed on the Web scale
[e.g. HTML, CSS, JavaScript, DOM and Unicode].)

- RDFa parsers can be given an override list of legacy vocabularies
that

will be loaded from disk (from a cached copy).

"Cache" means that you can still go find the original and the cache is
just nearer.

If a cached copy of the vocabulary cannot be found, it can be re-
created from scratch if necessary.

Do any end user applications that use RDF internally provide a UI for
installing local re-creations?

On May 15, 2009, at 20:25, Shelley Powers wrote:

Also don't lose sight that this is really no more serious an issue
than, say, a company originating "com.sun.*" being purchased by
another company, named "com.oracle.*". And you can't say, "Well
that's not the same", because it is.

It's not the same. A Java classloader doesn't "Follow Its Nose". A
classloader will find classes in my classpath even if there weren't a
server at sun.com. Likewise, http://sun.com/foo RDF predicates would
continue to work in applications that don't "Follow Their Nose" even
if the server at sun.com disappeared.

However, if the com.sun.* classes were renamed to com.oracle.* and the
com.sun.* copies withdrawn in a new release of a library, other
classes that have been compiled against com.sun.* classes would cease
to load. This is analogous to applications programmed to recognize http://web.resource.org/cc/*
predicates not recognizing http://creativecommons.org/ns#*
predicates. (You can't Follow Your Nose from the former to the latter,
BTW.)

--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] External document subset support


On May 18, 2009, at 09:36, Brett Zamir wrote:

Section 10.1, "Writing XHTML documents" observes: "According to the  
XML specification, XML processors are not guaranteed to process the  
external DTD subset referenced in the DOCTYPE."


While this is true, since no doubt the majority of web browsers are  
already able to process external stylesheets or scripts, might the  
very useful feature of external entity files, be employed by XHTML 5  
as a stricter subset of XML (similar to how XML Namespaces re- 
annexed the colon character) in order to allow this useful feature  
to work for XHTML (to have access to HTML entities or other useful  
entities for one, as well as enable a poor man's localization, etc.)?


See http://hsivonen.iki.fi/no-dtd/ explains why DTDs don't work for  
the Web in the general case.


Loading same-origin DTDs for the purpose of localization is a semi- 
defensible case, but it's a lot of complexity for a use case that is  
way on the wrong side of 80/20 on the Web scale. Besides, if the use  
case for DTDs is localization within an origin, the server can perform  
the XML parse and reserialize into DTDless XML. (That's how I've  
implemented this pattern in the past without client-side support.)


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] DOMTokenList is unordered but yet requires sorting