Ian Hickson wrote:
On Tue, 1 May 2007, Jonas Sicking wrote:
The latter is the option I'm following for now. Note that browsers all
do _different_ things for target="" than for href="". The spec has
made them act the same for now. I'm not sure this is workable, we'll
have to see when the browser vendors try to get this interoperable. I
can't imagine that it's a huge issue given that the browsers are so
far from each other in terms of what they do here. I'm going to do a
study of some subset of the Web to see how common this is (at least
the static case; I can't really do much about the scripted case).
I don't think this is a good solution actually. In general, I think it's
good to always make the DOM reflect the behavior of the document. I.e.
it shouldn't matter how you arrived to a specific DOM, be it through
parsing of an incoming HTML stream, or by using DOM-Core calls. Whenever
we make an exception for that rule I think we need to have a good reason
for it.
I think you misread what I wrote. Right now, there's no magic involved
here.
When you said "the latter is the option I'm following for now" I thought
you referred to "and Firefox and IE7/Win don't change any links". Is
that not the case?
Looking at the spec it doesn't mention anything special regarding DOM
mutations at all, so that would indeed make me think that links are
changed if a <base> element is inserted at the top of the <head> using
the DOM.
What I suggest is that we make the first or last <base> element in the
<head> be the one that sets both the base target and the base href for
the document (modulo all special handling needed when <base>s appear in
the body, described below). While this is not what IE or Firefox does
today, I doubt that it'll break enough pages to stray from the
act-like-the-DOM-looks principal.
Right now the href="" is from the first and the target="" is from the
last, but other than that that's what the spec says.
Why is the fact that the last target is the one used only defined in a
Note? Or am I missing it somewhere else?
Also, if we're going to be inconsistent in how current browsers and web
pages handle multiple <base>s, why not simply use the first <base> for
both href="" and target=""?
One thing we unfortunately will have to deal with is <base> elements
appearing in the middle of the body of the document. What mozilla had to
do was once we find a <base> element in the body of the document, we
tell the parser to remember the resolved href and/or target of that
<base> element. We then for any element that uses base uris (full list
at [1]) set an internal member in the element that hardcodes the
elements base uri and/or base target.
For elements that don't get this property set on them base href and
target resolution works as normal. For elements that has this set base
href and target resolution only uses the set properties.
Note that you only set the saved href and target in the parser if the
attribute is set in the <base> element. So if a document contains <base
target="foo"> in the middle of the body that does not set a saved href
in the parser.
This is deep magic, as far as the DOM goes. It also makes it hard to debug
-- e.g. dynamically modifiying <base> elements, moving them, etc, has no
effect anymore.
Yup, I agree that this is deep magic as far as a DOM user goes.
HOWEVER, having said that, this is a tiny minority of pages. According to
a study I did of over 100,000,000 pages, 0.036% of pages have more than
one <base href=""> element (ignoring those that specify the same href=""
value more than once).
With <base href="">, you can get 404s, but in practice IE7 is already
doing that, and it doesn't seem to have affected adoption. Anecdotely,
most of these pages use absolute URIs, which might explain it.
It's much easier for IE to get away with breaking pages, mostly because
many people use IE as the yard-stick.
0.06% of pages have more than one <base target=""> element (again ignoring
duplicates). With <base target="">, the worst that can happen from the
user's point of view is that links will open in a new page instead of on
the same page, and in practice even that's not likely, since (anecdotely)
most pages with <base target=""> simply alternate between different names.
What do you think?
I would be hesitant to drop support for multiple <base>s in firefox
actually. Implementation wise it was very easy to implement, and it is
known that many pages out there break, though the percentage is small,
there are a lot of pages on the internet.
It might be something we could restrict to quirks mode pages though,
that's not a bad idea at all.
/ Jonas