Re: [whatwg] Sentence structure
From: Ian Hickson [mailto:i...@hixie.ch] On Thu, 10 Jan 2013, Thomas A. Fine wrote: Use Cases: 4. Clarifying sentence boundaries would be an aid in machine translation software. Do you have any evidence supporting this? I've spoken with engineers who work on machine translation software and while they've certainly had requests (whence the translate attribute), they've never asked for a way to mark up sentences. I'm doing some related work that requires machine translation on the lines of export/import HTML snippets. Human language content boundaries are directly determined by author's grammatical punctuation skills at the sentence level. HTML is everything to-do tied-up with GUI web-browsers, so machine translation, screen readers, so forth are supported through other living standards GRDDL XSLT RDFa that also work with HTML as one of multiple possible host, however their relationship with XML serialization as dependency for proper functioning might cause browser engine makers to promote sticking to microdata, unless someday we get Google SilverFlash.java Safari plug-in so that one size will fit all. As HTML is host language in wide-spread use (my apologies for lacking statistics that I compensate by deriving statements from common sense), perhaps this is starting point for raising concerns that may be redirected into other specs too. It's the only opening for those rare use cases as the story of Emperor's New Clothes. Getting back to business, for larger content fragments there's the p element. An immediate citation is search results cut-off abrupt fragments in content preview. For improvising on such fragment indices they've come up with schema.org vocab which I just had to remind here. They've got provision to specialize from their general pre-defined types, so ThingWebPageElement can be used to get ThingWebPageElementParagraphSentence This can be expressed using html5 microdata itemtype attribute as: span itemscope=itemscope itemtype=http://www.schema.org/thing/webpage/webpageelement/paragraph/sente nceOne whole sentence!/span HTML5 without XML serialization will allow to skip =itemscope too! saves 12 characters, savings comparable to those recommended by minifying. :-)
Re: [whatwg] Sentence structure
On Sat, 12 Jan 2013, Vipul S. Chawathe wrote: I'm doing some related work that requires machine translation on the lines of export/import HTML snippets. Human language content boundaries are directly determined by author's grammatical punctuation skills at the sentence level. Sure, but if the author isn't competent enough to use punctuation, I think we're probably not going to be able to rely on them using sentence correctly either, at the end of the day. HTML is everything to-do tied-up with GUI web-browsers, so machine translation, screen readers, so forth are supported through other living standards GRDDL XSLT RDFa that also work with HTML as one of multiple possible host, however their relationship with XML serialization as dependency for proper functioning might cause browser engine makers to promote sticking to microdata, unless someday we get Google SilverFlash.java Safari plug-in so that one size will fit all. As HTML is host language in wide-spread use (my apologies for lacking statistics that I compensate by deriving statements from common sense), perhaps this is starting point for raising concerns that may be redirected into other specs too. It's the only opening for those rare use cases as the story of Emperor's New Clothes. Getting back to business, for larger content fragments there's the p element. An immediate citation is search results cut-off abrupt fragments in content preview. For improvising on such fragment indices they've come up with schema.org vocab which I just had to remind here. They've got provision to specialize from their general pre-defined types, so ThingWebPageElement can be used to get ThingWebPageElementParagraphSentence This can be expressed using html5 microdata itemtype attribute as: span itemscope=itemscope itemtype=http://www.schema.org/thing/webpage/webpageelement/paragraph/sente nceOne whole sentence!/span HTML5 without XML serialization will allow to skip =itemscope too! saves 12 characters, savings comparable to those recommended by minifying. :-) I'm sorry, but I've no idea what you're saying here. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Sentence structure
On Thu, 10 Jan 2013, Thomas A. Fine wrote: Use Cases: 1. Formatting sentence spacing to approximate the look of almost all books in English from 1650-1950. This is possible today, using span class=sentence. Unless approximating the formatting of a small minority of old books becomes much more common than it is now, this use case probably doesn't justify using a dedicated element. 2. Formatting sentence spacing because it is very likely an aid to scanning text, and there are some indications that it is helpful for new readers, readers learning a new language, and readers with visual scanning issues and other learning disabilities. Browsers can do this without markup (sentences are detectable by some relatively simple heuristics), so this wouldn't justify adding a markup-level feature. Incidentally, do you have any research to support this claim? My understanding is that in practice the double-spacing at the end of sentences is considered an antiquated practice that doesn't actually help with reading much, certainly not as much as slightly increased line spacing, clear punctuation, and the like. 3. Formatting sentence spacing because I like it that way. This is possible today, using span class=sentence. Unless your preference here becomes much more common than it is now, this use case probably doesn't justify using a dedicated element. 4. Clarifying sentence boundaries would be an aid in machine translation software. Do you have any evidence supporting this? I've spoken with engineers who work on machine translation software and while they've certainly had requests (whence the translate attribute), they've never asked for a way to mark up sentences. 5. Clarifying sentence boundaries would be an aid to screen readers to help provide correct inflection. Screen readers must have excellent sentence ending detections regardless of what features we provide, because most Web pages (and there are trillions already) don't include such markup. So adding an element would not solve this problem. Since the use cases do not currently support adding an element for this purpose, I have not added the element to the language. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Sentence structure
I guess I was just way too long-winded. Buried in there were some good ideas, and I'm no longer strictly advocating just a sentence tag. I read more about how things are supposed to work, and I focused on what is needed in general terms, and then as many different possible solutions and their pros and cons. I still think a sentence tag is a good idea, but I would now really favor an approach that allows CSS to interpret a pair of spaces following terminal punctuation directly as a sentence break, and then provide a mechanism to format that directly. If I had to narrow things down to just one choice rather than a spectrum of available approaches it would be that one. It's practical for content developers, straightforward to implement, can be easily applied to previously generated content, and does not ugly up the HTML (in fact the HTML wouldn't even change at all, only a tiny bit of CSS would be added). It's not ideal for semantic sentence detection, but is at least a significant improvement there. tom
Re: [whatwg] Sentence structure
On Thu, 10 Jan 2013, Thomas A. Fine wrote: I guess I was just way too long-winded. Buried in there were some good ideas, and I'm no longer strictly advocating just a sentence tag. I read more about how things are supposed to work, and I focused on what is needed in general terms, and then as many different possible solutions and their pros and cons. I still think a sentence tag is a good idea, but I would now really favor an approach that allows CSS to interpret a pair of spaces following terminal punctuation directly as a sentence break, and then provide a mechanism to format that directly. If I had to narrow things down to just one choice rather than a spectrum of available approaches it would be that one. It's practical for content developers, straightforward to implement, can be easily applied to previously generated content, and does not ugly up the HTML (in fact the HTML wouldn't even change at all, only a tiny bit of CSS would be added). It's not ideal for semantic sentence detection, but is at least a significant improvement there. I don't know if the use cases justify adding a feature to CSS, but I'll let the CSS editors and browser vendors be the judges of that. :-) The CSS spec is discussed on the www-st...@w3.org list. HTH, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Sentence structure
On 1/10/13 11:36 PM, Ian Hickson wrote: I don't know if the use cases justify adding a feature to CSS, but I'll let the CSS editors and browser vendors be the judges of that. :-) The CSS spec is discussed on the www-st...@w3.org list. Sorry then, I was under the impression that WHATWG covered a broader spectrum than just the HTML piece. tom
Re: [whatwg] Sentence structure
On Thu, 10 Jan 2013, Thomas A. Fine wrote: On 1/10/13 11:36 PM, Ian Hickson wrote: I don't know if the use cases justify adding a feature to CSS, but I'll let the CSS editors and browser vendors be the judges of that. :-) The CSS spec is discussed on the www-st...@w3.org list. Sorry then, I was under the impression that WHATWG covered a broader spectrum than just the HTML piece. We currently cover the following specs: http://whatwg.org/specs Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'