Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Wed, Sep 18, 2013 at 11:10 AM, L. David Baron wrote: > In Gecko it's also implemented through CSS inheritance, but it's not > exposed to Web content as a CSS property. (Internally it's > '-x-lang', but that name isn't exposed.) > > We use the language for: > * font selection > * language-specific text-transform behavior > * hyphenation (which doesn't work unless it's explicitly specified, >as required by http://dev.w3.org/csswg/css-text/#hyphens-property ) It seems my earlier point about inheritance of text direction remains. Base URLs however are obsolete as only Gecko implements xml:base. If this is implemented through CSS, does it make sense to expose it through the DOM? -- http://annevankesteren.nl/
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Wednesday 2013-05-01 01:01 -0700, Elliott Sprehn wrote: > On Wed, Apr 24, 2013 at 9:22 AM, Peter Occil wrote: > > I have no objection to the name "baseLang" rather than "language" as the > > name of the DOM attribute. > > > > But if there isn't more interest or you decide not to add this DOM > > attribute, I encourage you to at least: > > > > > fwiw WebKit (and Blink) implement this through CSS inheritance since you > need to know the lang for all kinds of things and walking up the DOM > repeatedly would be expensive. > > -webkit-locale is inherited by default and contains the enclosing @lang > value. You can query it through getComputedStyle(node).webkitLocale. That > doesn't help your custom parser though. In Gecko it's also implemented through CSS inheritance, but it's not exposed to Web content as a CSS property. (Internally it's '-x-lang', but that name isn't exposed.) We use the language for: * font selection * language-specific text-transform behavior * hyphenation (which doesn't work unless it's explicitly specified, as required by http://dev.w3.org/csswg/css-text/#hyphens-property ) -David -- š L. David Baron http://dbaron.org/ š š¢ Mozilla https://www.mozilla.org/ š Before I built a wall I'd ask to know What I was walling in or walling out, And to whom I was like to give offense. - Robert Frost, Mending Wall (1914)
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Mon, 16 Sep 2013, Peter Occil wrote: > > - Localization of form controls in languages where browser support is > lacking, such as some minor languages. > > - Localization of HTML elements, especially date formatting of span and > div elements in the page's default language [...] > > You said these use cases were valid; how do you think so? I mean these are things that users want and that authors have to do. Your e-mail is still on my list of e-mails to deal with. (Specifically, it's in the pile of e-mail relating to new features.) It hasn't been forgotten, don't worry. :-) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
Apparently, the use cases I mentioned before have not been discussed yet: " - Localization of form controls in languages where browser support is lacking, such as some minor languages. - Localization of HTML elements, especially date formatting of span and div elements in the page's default language, see especially [1]. " You said these use cases were valid; how do you think so? --Peter [1]: http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#input-impl-notes -Original Message- From: Ian Hickson Sent: Monday, September 16, 2013 6:05 PM To: Jukka K. Korpela Cc: whatwg@lists.whatwg.org Subject: Re: [whatwg] HTML: A DOM attribute that returns the language of a node On Fri, 2 Aug 2013, Jukka K. Korpela wrote: 2013-08-02 2:43, Ryosuke Niwa wrote: > > > > Are you saying that for HTML contenteditable-based editors that want > > to support drag-and-drop editing, they need to be able to annotate > > the outgoing HTML fragment with the effective language so that when > > it's embedded somewhere, the right fonts get used? > > Yes, but not just for drag and drop. This would mean that the editor would have to guess the language from the text or ask the user to specify it. Well presumably just getting the language out of the document would be a good first step. This is not as unrealistic as it may first seem. Microsoft Word does such things [...] Sure, there's plenty of examples of language identification. But regarding the effect of language markup on fonts, the effect is limited to situations where the font is not specified in a style sheet. Yes. That's a case we should probably make sure we handle. So it could be added, well, just because there is no good reason not to. There's always reasons not to add something: http://wiki.whatwg.org/wiki/FAQ#Where.27s_the_harm_in_adding.E2.80.94 -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Thu, 1 Aug 2013, Ryosuke Niwa wrote: > > > > Are you saying that for HTML contenteditable-based editors that want > > to support drag-and-drop editing, they need to be able to annotate the > > outgoing HTML fragment with the effective language so that when it's > > embedded somewhere, the right fonts get used? > > Yes, but not just for drag and drop. Sure, also for copy-and-paste, etc. The point is that browsers need to provide this for at least some of the cases. > > This seems like something that browsers should just do automatically > > for copy-and-paste and drag-and-drop, I wouldn't want to require that > > every contenteditable-based editor have to reimplement this. That > > seems like a lot of redundant work, and in particular, seems to be > > work that most editor implementors would forget. If the browsers just > > did the annotation automatically, then this would work even in editors > > whose implementors didn't worry about i18n. > > How are browsers supposed to do this if the author was simply using > innerHTML? How would an author use innerHTML here? I agree that there's a use case for providing language information, my point is just that this use case seems to need more than just that: it also needs that browsers add language annotations during drag-and-drop and copy-and-paste, at least. Is there anything else that it needs? Will putting language information on drag-and-drop or copy/paste content have any Web compat impact? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Fri, 2 Aug 2013, Jukka K. Korpela wrote: > 2013-08-02 2:43, Ryosuke Niwa wrote: > > > > > > Are you saying that for HTML contenteditable-based editors that want > > > to support drag-and-drop editing, they need to be able to annotate > > > the outgoing HTML fragment with the effective language so that when > > > it's embedded somewhere, the right fonts get used? > > > > Yes, but not just for drag and drop. > > This would mean that the editor would have to guess the language from > the text or ask the user to specify it. Well presumably just getting the language out of the document would be a good first step. > This is not as unrealistic as it may first seem. Microsoft Word does > such things [...] Sure, there's plenty of examples of language identification. > But regarding the effect of language markup on fonts, the effect is > limited to situations where the font is not specified in a style sheet. Yes. That's a case we should probably make sure we handle. > So it could be added, well, just because there is no good reason not to. There's always reasons not to add something: http://wiki.whatwg.org/wiki/FAQ#Where.27s_the_harm_in_adding.E2.80.94 -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Aug 8, 2013, at 7:29 AM, Jukka K. Korpela wrote: > 2013-08-08 2:57, Ryosuke Niwa wrote: > >> On Aug 2, 2013, at 6:10 AM, Jukka K. Korpela >> wrote: > [...] >>> But regarding the effect of language markup on fonts, the effect is >>> limited to situations where the font is not specified in a style >>> sheet. This is a rather uncommon scenario these days; authors are >>> more than eager to set fonts. >> >> Do you have actual statistics to support this point? > > No, itās just an impression from looking at numerous pages and their coding > as well as views presented in authorsā forums. > >> As far as I >> checked, neither baidu.com nor yahoo.com.tw seems to explicitly >> specify a Chinese font. > > They both have font-family settings, slightly different, but basically the > most common (sorry, no statistic on this either) setup that uses Arial > (possibly with Helvetica as second option, which does not change much). So, > granted, they donāt specify a Chinese font in the sense of including any > specific fonts containing CJK characters in the font-family list. > > Baidu doesnāt set lang either, so they seem to be accepting, for any > characters not covered by Arial, whatever happens to be in each browserās > list of fallback fonts, when no information about content language is > available. Yahoo.com.tw sets lang="zh-tw", so they do care, but only to the > extent that the fallback font should be one intended for Traditional Chinese. > > So the lang markup may affect fonts, but only under some conditions. And if > you care about fonts, as an author, then an explicit list of font > alternatives has better chances of creating the desired result. That's not a practical solution because we can't possibly know the list of Chinese & Japanese fonts available by default in all operating systems. >>> It is true that they might specify a font list where none of the >>> fonts supports some characters that might be entered, and then a >>> fallback font would be used. However, using āannotationsā >>> (presumably, lang attributes, along with extra elements when >>> needed) does not sound like a feasible approach to this. >> >> Whether itās feasible or not, thatās what we have been doing due to >> the Han unification. If we could, weāll undo the Han unification and >> use different glyphs for each character but we canāt do that at this >> point in time. > > If a page contains texts to be rendered using different forms (Traditional > Chinese, Simplified Chinese, Japanese, Korean) for Han characters, you will > need to control the rendering somehow. Using lang markup might be > theoretically most adequate, but itās indirect and less effective than just > setting different fonts (via font-family lists that contain reasonably many > alternatives). Controlling the rendering isn't the goal here. The point is to use the correct glyph in each language so that each character is recognizable by users. Again, specifying a font name is not a practical solution as authors have no way of knowing the list of Chinese & Japanese fonts provided by all current and future operating systems. > But even if lang attributes are used, I donāt think the issue has much > relevance to the original question here. A DOM attribute that returns the > language of a node would be useful for the purpose only if you intend to > affect rendering via JavaScript. No. The point is that any code that attempts to move or copy contents must preserve the effective value of the lang attribute. - R. Niwa
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
2013-08-08 2:57, Ryosuke Niwa wrote: On Aug 2, 2013, at 6:10 AM, Jukka K. Korpela wrote: [...] But regarding the effect of language markup on fonts, the effect is limited to situations where the font is not specified in a style sheet. This is a rather uncommon scenario these days; authors are more than eager to set fonts. Do you have actual statistics to support this point? No, itās just an impression from looking at numerous pages and their coding as well as views presented in authorsā forums. As far as I checked, neither baidu.com nor yahoo.com.tw seems to explicitly specify a Chinese font. They both have font-family settings, slightly different, but basically the most common (sorry, no statistic on this either) setup that uses Arial (possibly with Helvetica as second option, which does not change much). So, granted, they donāt specify a Chinese font in the sense of including any specific fonts containing CJK characters in the font-family list. Baidu doesnāt set lang either, so they seem to be accepting, for any characters not covered by Arial, whatever happens to be in each browserās list of fallback fonts, when no information about content language is available. Yahoo.com.tw sets lang="zh-tw", so they do care, but only to the extent that the fallback font should be one intended for Traditional Chinese. So the lang markup may affect fonts, but only under some conditions. And if you care about fonts, as an author, then an explicit list of font alternatives has better chances of creating the desired result. It is true that they might specify a font list where none of the fonts supports some characters that might be entered, and then a fallback font would be used. However, using āannotationsā (presumably, lang attributes, along with extra elements when needed) does not sound like a feasible approach to this. Whether itās feasible or not, thatās what we have been doing due to the Han unification. If we could, weāll undo the Han unification and use different glyphs for each character but we canāt do that at this point in time. If a page contains texts to be rendered using different forms (Traditional Chinese, Simplified Chinese, Japanese, Korean) for Han characters, you will need to control the rendering somehow. Using lang markup might be theoretically most adequate, but itās indirect and less effective than just setting different fonts (via font-family lists that contain reasonably many alternatives). But even if lang attributes are used, I donāt think the issue has much relevance to the original question here. A DOM attribute that returns the language of a node would be useful for the purpose only if you intend to affect rendering via JavaScript. Yucca
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Aug 2, 2013, at 6:10 AM, Jukka K. Korpela wrote: > 2013-08-02 2:43, Ryosuke Niwa wrote: > >>> Are you saying that for HTML contenteditable-based editors that want to >>> support drag-and-drop editing, they need to be able to annotate the >>> outgoing HTML fragment with the effective language so that when it's >>> embedded somewhere, the right fonts get used? >> >> Yes, but not just for drag and drop. > > This would mean that the editor would have to guess the language from the > text or ask the user to specify it. This is not as unrealistic as it may > first seem. Microsoft Word does such things, sometimes getting things right, > often messing things up. It typically detects change of language too late, > and often infers language from keyboard settings, making it rather impossible > to use a multilingual keyboard easily. > > But regarding the effect of language markup on fonts, the effect is limited > to situations where the font is not specified in a style sheet. This is a > rather uncommon scenario these days; authors are more than eager to set fonts. Do you have actual statistics to support this point? As far as I checked, neither baidu.com nor yahoo.com.tw seems to explicitly specify a Chinese font. Also, I have just recently experienced the font type change on Gmail when I was conversing with a native Chinese speaker. Her mail client used Chinese fonts before Japanese fonts whereas mine had Japanese fonts before Chinese fonts. > It is true that they might specify a font list where none of the fonts > supports some characters that might be entered, and then a fallback font > would be used. However, using āannotationsā (presumably, lang attributes, > along with extra elements when needed) does not sound like a feasible > approach to this. Whether itās feasible or not, thatās what we have been doing due to the Han unification. If we could, weāll undo the Han unification and use different glyphs for each character but we canāt do that at this point in time. - R. Niwa
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
2013-08-02 2:43, Ryosuke Niwa wrote: Are you saying that for HTML contenteditable-based editors that want to support drag-and-drop editing, they need to be able to annotate the outgoing HTML fragment with the effective language so that when it's embedded somewhere, the right fonts get used? Yes, but not just for drag and drop. This would mean that the editor would have to guess the language from the text or ask the user to specify it. This is not as unrealistic as it may first seem. Microsoft Word does such things, sometimes getting things right, often messing things up. It typically detects change of language too late, and often infers language from keyboard settings, making it rather impossible to use a multilingual keyboard easily. But regarding the effect of language markup on fonts, the effect is limited to situations where the font is not specified in a style sheet. This is a rather uncommon scenario these days; authors are more than eager to set fonts. It is true that they might specify a font list where none of the fonts supports some characters that might be entered, and then a fallback font would be used. However, using āannotationsā (presumably, lang attributes, along with extra elements when needed) does not sound like a feasible approach to this. But I guess the issue is still adding a DOM property for element nodes, specifying the language of the node, to the extent that it can be inferred from lang or xml:lang attribute or from HTTP headers (real or faked via ). Although the use cases are somewhat rare and not particularly important, the property would be conceptually easy and presumably easy to implement in browsers. So it could be added, well, just because there is no good reason not to. It may understandably irritate authors who need language information that they know that the browser has it (it needs it to implement :lang() in CSS) but does not give authors access to it. Yucca
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Jul 26, 2013, at 11:20 AM, Ian Hickson wrote: > On Wed, 24 Jul 2013, Ryosuke Niwa wrote: >> On Jul 16, 2013, at 11:25 AM, Ian Hickson wrote: >>> On Tue, 16 Jul 2013, Takayoshi Kochi (ę²³å éä») wrote: IIUC WebKit uses internally node's language to determine which font to use to render text, e.g for Han unification (https://en.wikipedia.org/wiki/Han_unification) WebKit has to choose a proper glyph depending on its lang attribute for the same Unicode codepoint. >>> >>> Sure, but internal UA uses aren't use cases for the Web. >>> >>> The use cases Peter gave over the weekend are valid, though. >> >> The fact browsers use the "effective" language for font selection is >> very relevant in HTML editing. For example, consider the following >> document: >> >> >> >> å§å»å© >> >> >> >> >> >> >> >> If you were to get the innerHTML of #source and insert it into >> #destination, the effective language changes from Chinese and Japanese >> and the three characters transform their shapes because browsers will >> use different fallback fonts. > > It's unclear to me what use case you are describing here. > > Are you saying that for HTML contenteditable-based editors that want to > support drag-and-drop editing, they need to be able to annotate the > outgoing HTML fragment with the effective language so that when it's > embedded somewhere, the right fonts get used? Yes, but not just for drag and drop. > This seems like something that browsers should just do automatically for > copy-and-paste and drag-and-drop, I wouldn't want to require that every > contenteditable-based editor have to reimplement this. That seems like a > lot of redundant work, and in particular, seems to be work that most > editor implementors would forget. If the browsers just did the annotation > automatically, then this would work even in editors whose implementors > didn't worry about i18n. How are browsers supposed to do this if the author was simply using innerHTML? I don't see how we can automatically annotate innerHTML. - R. Niwa
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Wed, 24 Jul 2013, Ryosuke Niwa wrote: > On Jul 16, 2013, at 11:25 AM, Ian Hickson wrote: > > On Tue, 16 Jul 2013, Takayoshi Kochi (ę²³å éä») wrote: > >> > >> IIUC WebKit uses internally node's language to determine which font > >> to use to render text, e.g for Han unification > >> (https://en.wikipedia.org/wiki/Han_unification) WebKit has to choose > >> a proper glyph depending on its lang attribute for the same Unicode > >> codepoint. > > > > Sure, but internal UA uses aren't use cases for the Web. > > > > The use cases Peter gave over the weekend are valid, though. > > The fact browsers use the "effective" language for font selection is > very relevant in HTML editing. For example, consider the following > document: > > > > å§å»å© > > > > > > > > If you were to get the innerHTML of #source and insert it into > #destination, the effective language changes from Chinese and Japanese > and the three characters transform their shapes because browsers will > use different fallback fonts. It's unclear to me what use case you are describing here. Are you saying that for HTML contenteditable-based editors that want to support drag-and-drop editing, they need to be able to annotate the outgoing HTML fragment with the effective language so that when it's embedded somewhere, the right fonts get used? This seems like something that browsers should just do automatically for copy-and-paste and drag-and-drop, I wouldn't want to require that every contenteditable-based editor have to reimplement this. That seems like a lot of redundant work, and in particular, seems to be work that most editor implementors would forget. If the browsers just did the annotation automatically, then this would work even in editors whose implementors didn't worry about i18n. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Jul 16, 2013, at 11:25 AM, Ian Hickson wrote: > On Tue, 16 Jul 2013, Takayoshi Kochi ($B2OFb(B $BN4?N(B) wrote: >> >> IIUC WebKit uses internally node's language to determine which font to use >> to render text, >> e.g for Han unification (https://en.wikipedia.org/wiki/Han_unification) >> WebKit has to choose >> a proper glyph depending on its lang attribute for the same Unicode >> codepoint. > > Sure, but internal UA uses aren't use cases for the Web. > > The use cases Peter gave over the weekend are valid, though. The fact browsers use the "effective" language for font selection is very relevant in HTML editing. For example, consider the following document: $BANV}$(D7q(B If you were to get the innerHTML of #source and insert it into #destination, the effective language changes from Chinese and Japanese and the three characters transform their shapes because browsers will use different fallback fonts. - R. Niwa
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Tue, 16 Jul 2013, Takayoshi Kochi (ę²³å éä») wrote: > > IIUC WebKit uses internally node's language to determine which font to use > to render text, > e.g for Han unification (https://en.wikipedia.org/wiki/Han_unification) > WebKit has to choose > a proper glyph depending on its lang attribute for the same Unicode > codepoint. Sure, but internal UA uses aren't use cases for the Web. The use cases Peter gave over the weekend are valid, though. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
(resending from correct address) On Tue, Jul 16, 2013 at 10:18 AM, Takayoshi Kochi (ę²³å éä») wrote: > IIUC WebKit uses internally node's language to determine which font to use > to render text, > e.g for Han unification (https://en.wikipedia.org/wiki/Han_unification) > WebKit has to choose > a proper glyph depending on its lang attribute for the same Unicode > codepoint. Yes, WebKit does this using -webkit-locale which Elliott mentioned above. Firefox and IE also use language for font selection, but I'm not familiar with their implementation.
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
IIUC WebKit uses internally node's language to determine which font to use to render text, e.g for Han unification (https://en.wikipedia.org/wiki/Han_unification) WebKit has to choose a proper glyph depending on its lang attribute for the same Unicode codepoint. Matt (falken@) knows more about this. On Sun, Jul 14, 2013 at 12:39 AM, Ian Hickson wrote: > On Fri, 12 Jul 2013, Peter Occil wrote: > > > > Well, my true hope is that such a DOM attribute like "language" will be > > specified in the HTML or DOM spec. Especially since it's not currently > > possible to get the language of a node through JavaScript methods alone. > > What's the use case for having this in HTML? > > -- > Ian Hickson U+1047E)\._.,--,'``.fL > http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. > Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' > -- Takayoshi Kochi
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
There are several use cases: - Localization of form controls in languages where browser support is lacking, such as some minor languages. - Localization of HTML elements, especially date formatting of span and div elements in the page's default language, see especially [1]. When it comes to retrieving the language of an element, a JavaScript implementation can do everything except retrieve the value of the Content-Language header of the document, so even providing a DOM attribute like "contentLanguage" will resolve this issue. [1]: http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#input-impl-notes -Original Message- From: Ian Hickson Sent: Saturday, July 13, 2013 11:39 AM To: Peter Occil Cc: WHATWG Subject: Re: [whatwg] HTML: A DOM attribute that returns the language of a node On Fri, 12 Jul 2013, Peter Occil wrote: Well, my true hope is that such a DOM attribute like "language" will be specified in the HTML or DOM spec. Especially since it's not currently possible to get the language of a node through JavaScript methods alone. What's the use case for having this in HTML? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Fri, 12 Jul 2013, Peter Occil wrote: > > Well, my true hope is that such a DOM attribute like "language" will be > specified in the HTML or DOM spec. Especially since it's not currently > possible to get the language of a node through JavaScript methods alone. What's the use case for having this in HTML? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
Well, my true hope is that such a DOM attribute like "language" will be specified in the HTML or DOM spec. Especially since it's not currently possible to get the language of a node through JavaScript methods alone. See [1] and its replies. --Peter [1]: http://lists.w3.org/Archives/Public/public-rdfa-wg/2013May/0064.html -Original Message- From: Ian Hickson Sent: Friday, July 12, 2013 3:48 PM To: Peter Occil Cc: WHATWG Subject: Re: [whatwg] HTML: A DOM attribute that returns the language of a node On Wed, 24 Apr 2013, Peter Occil wrote: Well in my case, I have written an HTML parser in Java and C# [1][2], which parses HTML documents and returns an object that implements a subset of the DOM, so far. As far as possible, I included only methods and attributes that were specified in the DOM or HTML specification, such as the characterSet attribute (which is called getCharacterSet on my DOM's IDocument interface), and more recently the innerHTML attribute (which is called getInnerHTML on my DOM's IElement interface) However, when I decided to implement an RDFa processor based on my HTML parser, I had need to include a method that returns the language of a node (see, for example, section 3.3 of reference [3]). As a result, I included a method called getLanguage on my DOM's INode interface (which may correspond to a possible--future--DOM attribute called "language" on the Node interface). I feel uneasy having to include this extension to what ought to be a subset of the HTML DOM. Implementations of HTML and the DOM are allowed to have internal methods to do things like this. There's no reason to limit yourself to the API visible to JavaScript. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Wed, 24 Apr 2013, Peter Occil wrote: > > Well in my case, I have written an HTML parser in Java and C# [1][2], > which parses HTML documents and returns an object that implements a > subset of the DOM, so far. As far as possible, I included only methods > and attributes that were specified in the DOM or HTML specification, > such as the characterSet attribute (which is called getCharacterSet on > my DOM's IDocument interface), and more recently the innerHTML attribute > (which is called getInnerHTML on my DOM's IElement interface) > > However, when I decided to implement an RDFa processor based on my HTML > parser, I had need to include a method that returns the language of a > node (see, for example, section 3.3 of reference [3]). As a result, I > included a method called getLanguage on my DOM's INode interface (which > may correspond to a possible--future--DOM attribute called "language" on > the Node interface). I feel uneasy having to include this extension to > what ought to be a subset of the HTML DOM. Implementations of HTML and the DOM are allowed to have internal methods to do things like this. There's no reason to limit yourself to the API visible to JavaScript. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Wed, May 1, 2013 at 9:01 AM, Elliott Sprehn wrote: > fwiw WebKit (and Blink) implement this through CSS inheritance since you > need to know the lang for all kinds of things and walking up the DOM > repeatedly would be expensive. > > -webkit-locale is inherited by default and contains the enclosing @lang > value. You can query it through getComputedStyle(node).webkitLocale. That > doesn't help your custom parser though. What mechanism did you use for base URLs? Or is the cost there deemed acceptable? For direction you need to have a similar mechanism available. (And you'll have the same problem roc pointed out with :lang() if you solve that through CSS as :dir(ltr) and :dir(rtl) exist these days.) If browsers already implement some kind of API for these things it might be worth to expose them in the same way. -- http://annevankesteren.nl/
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Wed, May 1, 2013 at 1:49 AM, Robert O'Callahan wrote: > On Wed, May 1, 2013 at 8:01 PM, Elliott Sprehn wrote: > >> fwiw WebKit (and Blink) implement this through CSS inheritance since you >> need to know the lang for all kinds of things and walking up the DOM >> repeatedly would be expensive. >> >> -webkit-locale is inherited by default and contains the enclosing @lang >> value. You can query it through getComputedStyle(node).webkitLocale. That >> doesn't help your custom parser though. >> > > Interesting. What does "body:lang(en) { -webkit-locale:fr; }" do? :-) > > Nothing sensible. :) We still walk up the tree when doing selector matching, but you'll confuse some render tree operations. Looking through the code again we don't use -webkit-locale as much as I thought, though perhaps we should and prevent user CSS from tampering with it. - E
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Wed, May 1, 2013 at 8:01 PM, Elliott Sprehn wrote: > fwiw WebKit (and Blink) implement this through CSS inheritance since you > need to know the lang for all kinds of things and walking up the DOM > repeatedly would be expensive. > > -webkit-locale is inherited by default and contains the enclosing @lang > value. You can query it through getComputedStyle(node).webkitLocale. That > doesn't help your custom parser though. > Interesting. What does "body:lang(en) { -webkit-locale:fr; }" do? :-) Rob -- qāqIqfq qyqoquq qlqoqvqeq qtqhqoqsqeq qwqhqoq qlqoqvqeq qyqoquq,q qwqhqaqtq qcqrqeqdqiqtq qiqsq qtqhqaqtq qtqoq qyqoquq?q qEqvqeqnq qsqiqnqnqeqrqsq qlqoqvqeq qtqhqoqsqeq qwqhqoq qlqoqvqeq qtqhqeqmq.q qAqnqdq qiqfq qyqoquq qdqoq qgqoqoqdq qtqoq qtqhqoqsqeq qwqhqoq qaqrqeq qgqoqoqdq qtqoq qyqoquq,q qwqhqaqtq qcqrqeqdqiqtq qiqsq qtqhqaqtq qtqoq qyqoquq?q qEqvqeqnq qsqiqnqnqeqrqsq qdqoq qtqhqaqtq.q"
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Wed, Apr 24, 2013 at 9:22 AM, Peter Occil wrote: > > I have no objection to the name "baseLang" rather than "language" as the > name of the DOM attribute. > > But if there isn't more interest or you decide not to add this DOM > attribute, I encourage you to at least: > > fwiw WebKit (and Blink) implement this through CSS inheritance since you need to know the lang for all kinds of things and walking up the DOM repeatedly would be expensive. -webkit-locale is inherited by default and contains the enclosing @lang value. You can query it through getComputedStyle(node).webkitLocale. That doesn't help your custom parser though. - E
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
I have no objection to the name "baseLang" rather than "language" as the name of the DOM attribute. But if there isn't more interest or you decide not to add this DOM attribute, I encourage you to at least: * define a DOM attribute to get the value of the document's Content-Language header (or other fallback language), or if that's not acceptable, * change the definition of the language of an HTML node to remove the requirement to use the document's Content-Language header, or if that's not acceptable, * change section 3.3 of HTML+RDFa to say, for example, that only information in the "lang" and "xml:lang" attributes are relevant for determining language for the purposes of HTML+RDFa. With any of these three alternatives, I can live with doing tree traversal for finding the language of a node, both in my Java/C# DOM and (if I get around to it) in JavaScript. Especially the last two may be viable if it turns out that people use the Content-Language header even less than people use the "lang" and "xml:lang" attributes on HTML and XHTML documents. - In the meantime, is it advisable for me to extend my Java/C# DOM by adding a "getLanguage" method? (See my previous message for context.) --Peter -Original Message- From: Anne van Kesteren Sent: Wednesday, April 24, 2013 10:03 AM To: Peter Occil Cc: WHATWG Subject: Re: [whatwg] HTML: A DOM attribute that returns the language of a node On Wed, Apr 24, 2013 at 6:49 AM, Peter Occil wrote: While a "language" attribute on Node may also be useful to HTML+RDFa processors in JavaScript, I have no plans to implement such a processor in JavaScript, though. There's https://www.w3.org/Bugs/Public/show_bug.cgi?id=16489 fwiw. Interest thus far seems fairly low, prolly due to what Kenny mentioned with people not doing much language tagging at all. -- http://annevankesteren.nl/
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
On Wed, Apr 24, 2013 at 6:49 AM, Peter Occil wrote: > While a "language" attribute on Node may also be useful to > HTML+RDFa processors in JavaScript, I have no plans to implement > such a processor in JavaScript, though. There's https://www.w3.org/Bugs/Public/show_bug.cgi?id=16489 fwiw. Interest thus far seems fairly low, prolly due to what Kenny mentioned with people not doing much language tagging at all. -- http://annevankesteren.nl/
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
What's my use case? Well in my case, I have written an HTML parser in Java and C# [1][2], which parses HTML documents and returns an object that implements a subset of the DOM, so far. As far as possible, I included only methods and attributes that were specified in the DOM or HTML specification, such as the characterSet attribute (which is called getCharacterSet on my DOM's IDocument interface), and more recently the innerHTML attribute (which is called getInnerHTML on my DOM's IElement interface) However, when I decided to implement an RDFa processor based on my HTML parser, I had need to include a method that returns the language of a node (see, for example, section 3.3 of reference [3]). As a result, I included a method called getLanguage on my DOM's INode interface (which may correspond to a possible--future--DOM attribute called "language" on the Node interface). I feel uneasy having to include this extension to what ought to be a subset of the HTML DOM. While a "language" attribute on Node may also be useful to HTML+RDFa processors in JavaScript, I have no plans to implement such a processor in JavaScript, though. [1] https://github.com/peteroupc/HtmlParser [2] https://github.com/peteroupc/HtmlParserCSharp [3] http://www.w3.org/TR/rdfa-in-html/ -Original Message- From: Kang-Hao (Kenny) Lu Sent: Tuesday, April 23, 2013 11:08 PM To: Peter Occil Cc: WHAT Working Group Subject: Re: [whatwg] HTML: A DOM attribute that returns the language of a node (13/04/23 16:44), Peter Occil wrote: I believe there should be a DOM attribute that returns the language of a node, as defined in section 3.2.3.3 "The lang and xml:lang attributes". What's your use case? If you want to style a particular language then there's the CSS :lang() pseudo-class. Use cases are important because otherwise I think there are very few pages with multiple lang attributes... While there is a "lang" DOM attribute, it's inadequate because it's only affected by the element's "lang" content attribute. That's true. However, if the case isn't important, we can do tree traversal (modulo HTTP Content-Language header and pragma) or exhaust element.matchesSelector(":lang(xxx)"). Also, I don't see a way to get the "language of a node" otherwise, especially since it depends not only on "lang" and "xml:lang", but also on the HTTP Content-Language header, which may not be possible to retrieve with existing JavaScript methods, as far as I can tell. Indeed. Cheers, Kenny -- Web Specialist, Opera Sphinx Game Force, Oupeng Browser, Beijing Try Oupeng: http://www.oupeng.com/
Re: [whatwg] HTML: A DOM attribute that returns the language of a node
(13/04/23 16:44), Peter Occil wrote: > I believe there should be a DOM attribute that returns the language > of a node, as defined in section 3.2.3.3 "The lang and xml:lang > attributes". What's your use case? If you want to style a particular language then there's the CSS :lang() pseudo-class. Use cases are important because otherwise I think there are very few pages with multiple lang attributes... > While there is a "lang" DOM attribute, it's inadequate because it's > only affected by the element's "lang" content attribute. That's true. However, if the case isn't important, we can do tree traversal (modulo HTTP Content-Language header and pragma) or exhaust element.matchesSelector(":lang(xxx)"). > Also, I don't see a way to get the "language of a node" otherwise, > especially since it depends not only on "lang" and "xml:lang", but > also on the HTTP Content-Language header, which may not be possible > to retrieve with existing JavaScript methods, as far as I can tell. Indeed. Cheers, Kenny -- Web Specialist, Opera Sphinx Game Force, Oupeng Browser, Beijing Try Oupeng: http://www.oupeng.com/