[whatwg] Editorial: ASCII case-insensitive string comparison

2012-05-12 Thread Øistein E . Andersen
o the corresponding characters in the range U+0061 to U+007A ([... a] to [... z])’.) Øistein E. Andersen

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-12 Thread Øistein E . Andersen
: > File "", line 1, in > UnicodeDecodeError: 'big5' codec can't decode bytes in position 0-1: illegal > multibyte sequence >>>> b'\xf9\xe9'.decode('big5-hkscs') > '╞' Python also says: >>> b'\xf9\xe9'.decode('cp950') u'\u255e' > Are there any sites that use these line drawing characters that would be > fixed by this? If not, I'm quite willing to accept the historical accidents > and move on :) Probably not many. Still, it seems safe to fix these four mappings if the characters are ever added to Unicode. Øistein E. Andersen

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-10 Thread Øistein E . Andersen
On 8 Apr 2012, at 18:03, Philip Jägenstedt wrote: > On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen wrote: > >> [1] <http://coq.no/character-tables/eten1.pdf> >> <http://coq.no/character-tables/eten1.js> > > What is the source for the

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-08 Thread Øistein E . Andersen
On 8 Apr 2012, at 18:03, Philip Jägenstedt wrote: > On Sat, 07 Apr 2012 16:04:55 +0200, Øistein E. Andersen wrote: > >> [...] >> [1] <http://coq.no/character-tables/eten1.pdf> >> <http://coq.no/character-tables/eten1.js> > > What is the source f

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-07 Thread Øistein E . Andersen
On 7 Apr 2012, at 15:04, Øistein E. Andersen wrote: > Suggested reverse mappings: > [...] > C6DE <= U+3003 > C6DF <= U+4EDD Sorry, these are different from the other C6xx (ETen-1) mappings. Correction: A1B2 <= U+3003 C969 <= U+4EDD Rationale: These codepoints

Re: [whatwg] Encoding: big5 and big5-hkscs

2012-04-07 Thread Øistein E . Andersen
ackwards compatibility' in the HKSCS-2008 standard, but no Unicode mappings are provided: 9EAC 9EC4 9EF4 9F4E 9FAD 9FB1 9FC0 9FC8 9FDA 9FE6 9FEA 9FEF A054 A057 A05A A062 A072 A0A5 A0AD A0AF A0D3 A0E1 I assume some systems will render at least these as potentially meaningful Han characters. -- Øistein E. Andersen

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-23 Thread Øistein E . Andersen
On 23 Oct 2009, at 04:20, Ian Hickson wrote: On Wed, 21 Oct 2009, Øistein E. Andersen wrote: ASCII-compatibility: The note in ‘2.1.5 Character encodings’ seems to say that [...] ISO-2022’[-*] are ASCII-compatible, whereas HZ-GB-2312 is not, and I cannot find anything in Section 2.1.5

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-22 Thread Øistein E . Andersen
On 22 Oct 2009, at 22:45, Philip Taylor wrote: On Thu, Oct 22, 2009 at 9:23 PM, Øistein E. Andersen wrote: On 22 Oct 2009, at 17:15, NARUSE, Yui wrote: Finally, Why ISO 2022 series is discouraged is not clear. We agree on this point. The string "숍訊昱穿" encoded as ISO-2022-KR is th

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-22 Thread Øistein E . Andersen
uch appreciated. -- Øistein E. Andersen

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-21 Thread Øistein E . Andersen
On 19 Oct 2009, at 05:52, Ian Hickson wrote: I've noted your e-mail here [...] and moved the whole thing out of the spec. That does not seem to apply to the last part of the original e-mail, quoted below. Øistein E. Andersen Other character encoding i

[whatwg] Potentially avoidable tokeniser/treebuilder dependency

2009-09-22 Thread Øistein E . Andersen
plementation is sufficiently close than to follow the specification; perhaps more importantly, removing unnecessary dependencies and allowing the tokeniser to run on its own would also make it easier to develop and test a tokeniser for use as part of a full parser.) -- Øistein E. Andersen

[whatwg] Quoted (') and (") appear as ('''') and (''''')

2009-09-17 Thread Øistein E . Andersen
ot;' (two occurrences of each, excluding an unrelated unproblematic instance inside a script) should be changed since they appear confusingly as ''''' and '''' in a sans-serif typeface. -- Øistein E. Andersen

Re: [whatwg] Surrogate pairs and character references

2009-09-16 Thread Øistein E . Andersen
s are parse errors. The phrase "characters and code points" (in the second sentence) is awkward given that all characters are in fact code points. -- Øistein E. Andersen

Re: [whatwg] Editorial: Colloquial contractions

2009-09-15 Thread Øistein E . Andersen
On 15 Sep 2009, at 02:37, Ian Hickson wrote: On Tue, 8 Sep 2009, Øistein E. Andersen wrote: The spec currently contains a few occurrences of colloquial contractions like "can't", "won't" and "there's", which should be changed to "cannot&

Re: [whatwg] Surrogate pairs and character references

2009-09-15 Thread Øistein E . Andersen
only, the appropriate term to use would probably be "Unicode scalar value". -- Øistein E. Andersen

[whatwg] Initial carriage return in and

2009-09-10 Thread Øistein E . Andersen
§ 9.1.2.5 "Restrictions on content models" mentions that an initial line feed (\n) character inside and will be removed. Should it not cover carriage return (\r) and \r\n as well? -- Øistein E. Andersen

[whatwg] Typo: 'possibly' as adjective

2009-09-10 Thread Øistein E . Andersen
"[P]ossibly algorithms" in the adoption agency algorithm note should be "possible algorithms". -- Øistein E. Andersen

Re: [whatwg] Surrogate pairs and character references

2009-09-09 Thread Øistein E . Andersen
On 8 Sep 2009, at 23:39, I wrote: UTF-16BE Actually, endianness is immaterial. Please read this as "UTF-16" instead. Sorry for the extra message. -- Øistein E. Andersen

[whatwg] U+FEFF (BOM) stripping in UTF-16BE and UTF-16LE

2009-09-08 Thread Øistein E . Andersen
y harmless and potentially useful to deal with bislabelled documents, but it might be worth adding an explanatory note. -- Øistein E. Andersen

[whatwg] Surrogate pairs and character references

2009-09-08 Thread Øistein E . Andersen
U +FFFD, so the mixed form may be interpreted as U+1,. -- Øistein E. Andersen

[whatwg] Ambiguous ampersand

2009-09-08 Thread Øistein E . Andersen
the most consistent solution would probably be to remove the parse error by setting the "additional allowed character" to '>' when encountering an ampersand in the "Attribute value (unquoted)" state. Also, making the sequence "&<" confor

[whatwg] Editorial: Colloquial contractions

2009-09-08 Thread Øistein E . Andersen
The spec currently contains a few occurrences of colloquial contractions like "can't", "won't" and "there's", which should be changed to "cannot", "will not", "there is" etc. for consistency. -- Øistein E. Andersen

[whatwg] Editorial: "Character reference data" tokeniser state name

2009-09-08 Thread Øistein E . Andersen
The "Character reference data" tokeniser state should probably be renamed to "Character reference in data". Adding "in" would arguably make the name more accurately descriptive and furthermore consistent with the "Character reference in attribute" state. -- Øistein E. Andersen

Re: [whatwg] Space characters: VT and FF

2009-09-02 Thread Øistein E . Andersen
). -- Øistein E. Andersen

[whatwg] EDITORIAL - Suggested corrections

2009-08-28 Thread Øistein E . Andersen
ake it easier to sidestep the problem: Either a and b share the same form owner, or neither of them has one. -- Øistein E. Andersen

[whatwg] Space characters: VT and FF

2009-08-28 Thread Øistein E . Andersen
consistent to treat form feed ('\f') in the same way? (Firefox handles both as non-space characters, IE and Safari handle both as space characters, and handling these two slightly exotic C0 white-space characters differently seems surprising.) -- Øistein E. Andersen

Re: [whatwg] Fwd: Entity parsing

2009-07-17 Thread Øistein E . Andersen
On 5 Jun 2009, at 00:49, Ian Hickson wrote: Could you give an example of what you mean? I'm having trouble following your description On Fri, 24 Apr 2009, Øistein E. Andersen wrote: Let &IE4 (resp. &HTML4, &HTML5) be a non-semicolon-terminated named character refer

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-07-17 Thread Øistein E . Andersen
uthors’ using legacy encodings’ or better ‘advise authors against using legacy encodings’. -- Øistein E. Andersen

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-11 Thread Øistein E . Andersen
have any questions. -- Øistein E. Andersen

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-09 Thread Øistein E . Andersen
Le 3 juin 09 à 23h19, Ian Hickson écrivit : On Tue, 14 Apr 2009, Øistein E. Andersen wrote: HTML5 currently contains a table of encodings aliases, [...] GB2312 and GB_2312-80 technically refer to the *character set* GB 2312-80, [...]. GBK, on the other hand, is an encoding. [...] There is

Re: [whatwg] Vulgar fractions

2009-06-08 Thread Øistein E . Andersen
y for this feature to be considered for addition to Firefox (apart from actually implementing it myself). -- Øistein E. Andersen

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-06-08 Thread Øistein E . Andersen
On Tue, 14 Apr 2009, Øistein E. Andersen wrote: Shift_JIS < Windows-31J [...] Shift-JIS < Windows-932 Le 5 juin 09, Anne van Kesteren écrivit : Is the implication here that Shift_JIS and Shift-JIS are distinct [...]? No, Shift-JIS and Windows-932 are commonly used names/labels f

[whatwg] Fwd: Entity parsing

2009-04-24 Thread Øistein E . Andersen
On 23 May 2008, at 03:50, Ian Hickson wrote: On Thu, 28 Jun 2007, Øistein E. Andersen wrote: 1) Is it useful to handle unterminated entities followed by an alphanumerical character like IE does? [...] 2) HTML 4.01 allows the semicolon to be omitted in certain cases. [...] Firefox and Safari

Re: [whatwg] HTML as a text format: Should be optional?

2009-04-18 Thread Øistein E . Andersen
so make it slightly more difficult to add (certain classes of) CSS, since a doctype would have to be added to give the expected rendering (and for the document to remain conforming). -- Øistein E. Andersen

[whatwg] HTML as a text format: Should be optional?

2009-04-17 Thread Øistein E . Andersen
PE is unfortunate, but seems impossible to get rid of at this point. A is usually a good idea, but is it really necessary to require this for conformance? After all, a is not something which an author is likely to forget, and leaving it out has no unexpected consequences. -- Øist

[whatwg] Vulgar fractions

2009-04-16 Thread Øistein E . Andersen
tly. (I am aware that fractions have been proposed earlier in the context of mathematical formulae, but I have not been able to find any previous discussion regarding vulgar fractions.) -- Øistein E. Andersen

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-04-13 Thread Øistein E . Andersen
X 1001:1992 (non-hangul) - All possible hangul (including those in KS X 1001:1992) This encoding contains the same characters as Windows-949, but arranged more systematically. Unfortunately, the encoding is not compatible with EUC-KR. Opera does not support Johab. Safari does not render my test page at all. -- Øistein E. Andersen

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-04-12 Thread Øistein E . Andersen
On 2 Sep 2008, at 06:06, Ian Hickson wrote: On Wed, 30 Jul 2008, Øistein E. Andersen wrote: 1. Opera, Firefox and Safari all handle US-ASCII as Windows-1252. IE7, on the other hand, simply ignores the high bit (as it does for a few other 7-bit encodings, by the way). Perhaps this

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-04-11 Thread Øistein E . Andersen
problem? On Thu, 13 Mar 2008, Øistein E. Andersen wrote: Note: Similarly, IE apparently handles CS-ISO-2022-JP as distinct from ISO-2022-JP. This is something to keep in mind when looking at multi-byte encodings. What should we say about this? The issue seems to be that IE&#

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-07-29 Thread Øistein E . Andersen
if it can be shown that documents containing the letter Ў/ў (only in KOI8-RU) are frequently mislabelled as KOI8-U. > Do you have input on the EUC-JP issue? Not yet, but you can expect some input on CJK encodings at some point in the future. -- Øistein E. Andersen

[whatwg] [Slightly OT(?)] Programmatically defined styles [Re: Superset encodings [Re: ISO-8859-* and the C1 control range]]

2008-05-30 Thread Øistein E . Andersen
are actually used in each document) is certainly feasible. However, this solution would not seem to be practical for a colour scheme using a larger number of colours. Would your mantra remain the same given, e.g., 256^2 or 64^3 distinct shades of colour? If not, where should the boundary be drawn? -- Øistein E. Andersen

Re: [whatwg] Supporting MathML and SVG in text/html, and related topics

2008-04-11 Thread Øistein E . Andersen
On Thursday 10th April 2008, Ian Hickson wrote: > SVG radicals aren't typographically acceptable either. > You really want to use fonts for this. Current browsers are clearly better at rendering TrueType and PostScript fonts at small sizes than equivalent shapes expressed as SVG paths. (This may

[whatwg] Turkish encodings: ISO 8859-9 < CP1254

2008-04-03 Thread Øistein E . Andersen
As suggested earlier, ISO 8859-9 is a proper subset of CP1254, and IE7 always uses the superset. [Actually, the name shown in the menu varies -- Turkish (ISO) v. Turkish (Windows) --, but the underlying encoding vector appears to be the same.] Test pages (identical data, different Charset headers

Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-03-16 Thread Øistein E . Andersen
dering. -- Øistein E. Andersen

[whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-03-12 Thread Øistein E . Andersen
On 5th June 2007, Øistein E. Andersen wrote: > (To do this properly, what we really ought to do is look for > C1 and undefined characters in all IANA charsets and semi-official > mappings to Unicode and check 1) whether the gaps can be filled > by borrowing from other encodings, an

Re: [whatwg] several messages about handling encodings in HTML

2008-03-03 Thread Øistein E . Andersen
te, as well as U+FDD0 to U+FDDF and the non-characters *FE and *FF when these are expressed as character references. Would it be possible to (dis)allow the same set of characters in both cases? -- Øistein E. Andersen

Re: [whatwg] several messages about the HTML syntax

2008-03-03 Thread Øistein E . Andersen
kely an error: [...] >> >> To make the notion of conformance more useful for authors (that is, to >> make conformance checking catch unintentional stuff), I suggest making >> starting an unquoted attribute value with a = a parse error. > > Done. > > > On Mon, 1

Re: [whatwg] Unicode mappings for ⟨ and ⟩

2007-07-01 Thread Øistein E . Andersen
L. David Baron wrote: > What's wrong with these mappings, and why shouldn't they > also be the mappings in HTML5? The problem is that they are canonically equivalent to CJK characters. http://www.unicode.org/reports/tr15/ describes Unicode normalisation in general and mentions singleton decomposi

Re: [whatwg] Unicode mappings for ⟨ and ⟩

2007-07-01 Thread Øistein E . Andersen
I wrote: > full-width East-Asian characters (" <"/" >"). That should be " <"/"> ". -- �istein E. Andersen

[whatwg] Unicode mappings for ⟨ and ⟩

2007-07-01 Thread Øistein E . Andersen
HTML5 currently maps ⟨ and ⟩ to U+3008 LEFT ANGLE BRACKET, U+3009 RIGHT ANGLE BRACKET, both belonging to `CJK angle brackets' in U+3000--U+303F CJK Symbols and Puntuation. HTML 4.01 maps them to U+2329 LEFT-POINTING ANGLE BRACKET, U+232A RIGHT-POINTING ANGLE BRACKET from `Angle

Re: [whatwg] Entity parsing

2007-06-28 Thread Øistein E . Andersen
sound, such content demonstrably exists, and available data do not support the presupposition that doing exactly what IE does is actually the best solution for handling existing content. -- Øistein E. Andersen

Re: [whatwg] Entity parsing

2007-06-27 Thread Øistein E . Andersen
s adopt existing conformance criteria and parsing rules? 4) Similar considerations for entities in attribute values. -- Øistein E. Andersen

Re: [whatwg] Entity parsing

2007-06-27 Thread Øistein E . Andersen
(1) is quite rare compared to (2), all the correctly encoded variants. Whether 0.0005% should be regarded as significant (supposing that résumé is representative) may be a contentious issue, but it is interesting to note that other errors — unwanted conversion of & to & in (3) and a typical encoding problem in (4) — are actually significantly more common, and these cannot be corrected at all. -- Øistein E. Andersen

Re: [whatwg] Entity parsing [trema/diaeresis vs umlaut]

2007-06-27 Thread Øistein E . Andersen
the first one is used in English. They are both used in English, actually (and the spelling with a ligature should not be considered obsolete in words borrowed from French, unlike those of Latin origin). -- Øistein E. Andersen

[whatwg] Editorial: typo (spelling)

2007-06-27 Thread Øistein E . Andersen
The verb `precede' does not follow the same pattern as `succeed' and `proceed'. s/precee/prece/g would correct the current misspellings. -- �istein E. Andersen

Re: [whatwg] Entity parsing [trema/diaeresis vs umlaut]

2007-06-26 Thread Øistein E . Andersen
latin-1 require a semicolon in IE, even in cases where it is optional according to SGML (and therefore will pass HTML 4.01 validation, I might add). -- Øistein E. Andersen

Re: [whatwg] Entity parsing [trema/diaeresis vs umlaut]

2007-06-26 Thread Øistein E . Andersen
probably be taken off list. -- Øistein E. Andersen

Re: [whatwg] Entity parsing

2007-06-25 Thread Øistein E . Andersen
On 25 Jun 2007, at 8:28AM, Ian Hickson wrote: > On Sun, 24 Jun 2007, Øistein E. Andersen wrote: > >> HTML5 currently follows IE7 much more closely than Safari, >>Firefox and Opera do, which seems to suggest that some of the quirks >>could be dispensed with. > > It&

Re: [whatwg] Entity parsing

2007-06-25 Thread Øistein E . Andersen
On 25 Jun 2007, at 11:57AM, Kristof Zelechovski wrote: > Inconsistently, as of IE7: I got &ge verbatim from your test. ≥ is /not/ a latin-1 entity. -- Øistein E. Andersen

Re: [whatwg] Entity parsing [trema/diæresis vs umlaut]

2007-06-25 Thread Øistein E . Andersen
lightly more common in Dutch. It would be interesting to see whether 19th-c. German actually made a distinction between umlaut on a, o, u and diæresis on e, i (e.g., Rhomboïd), but I do not know how consistently the diæresis was used, and words requiring it are typically foreign words that, unlike the rest, will not have been printed in Fraktur... -- Øistein E. Andersen

Re: [whatwg] Entity parsing

2007-06-23 Thread Øistein E . Andersen
p;" + entity name + ";" (i.e., expand the entity). Of course, conformance checkers would be more than welcome to signal that a certain current browser is unable to handle "A &mdash B" as expected, but this need not mean that all future browsers should be required not to handle it "properly" (as per arguably [in the original sense] more sensible SGML rules). -- Øistein E. Andersen

Re: [whatwg] Entity parsing [trema/diæresis vs umlaut]

2007-06-23 Thread Øistein E . Andersen
and a, o, u can all be umlauted (ä, ö, ü in German). Moreover, the double-dot accent also has other uses (e.g., ä and ë both designate a stressed schwa in Luxembourgeois), so it is probably not advisable to attempt a complete classification in HTML. -- Øistein E. Andersen *) possibly only in

Re: [whatwg] 9.2.2: replacement characters. How many?

2007-06-22 Thread Øistein E . Andersen
Ian Hickson wrote: > On Fri, 3 Nov 2006, Elliotte Harold wrote: > >> Section 9.2.2 of the current Web Apps 1.0 draft states: >> >>> Bytes or sequences of bytes in the original byte stream that could not >>> be converted to Unicode characters must be converted to U+FFFD >>> REPLACEMENT CHARACTER

Re: [whatwg] ISO-8859-* and the C1 control range

2007-06-05 Thread Øistein E . Andersen
Neither "ISO-8859-11" nor "Windows-874" appears in the list of IANA-approved character sets: http://www.iana.org/assignments/character-sets On the other hand, "TIS-620" (identical to ISO-8859-11 except that 0xA0 is left undefined) has been sanctioned by IANA. Perhaps Henri Sivonen could add a t

Re: [whatwg] ISO-8859-* and the C1 control range

2007-06-05 Thread Øistein E . Andersen
On Jun 5, 2007, at 11:38, Kristof Zelechovski wrote: > And why not:? > 2c) If the declared encoding was ISO-8859-2, replace that > character with the [correponding] character [... from] Windows-1250. On Jun 5, 2007, at 11:51, Henri Sivonen wrote: > that's not what [browsers] do, so apparently

Re: [whatwg] ISO-8859-* and the C1 control range

2007-06-04 Thread Øistein E . Andersen
; in UTF-8 and UTF-16. > What does IE7 do? IE7 does not seem to do this either, which indeed suggests that specific C1 treatment not be needed outside ISO-8859-*. -- Øistein E. Andersen

Re: [whatwg] ISO-8859-* and the C1 control range

2007-06-02 Thread Øistein E . Andersen
tain number of selected ISO-8859-* encodings. As suggested earlier [1], a simpler solution seems to be to treat C1 bytes and NCRs from /all/ ISO-8859-* and Unicode encodings as Windows-1252. [1] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2006-November/007804.html -- Øistein E. Andersen

Re: [whatwg] The m element [ and ]

2007-03-02 Thread Øistein E . Andersen
aphical emphasis, a technique that is arguably more effective than overemphasis. Again, the obvious alternative Typography does not seem quite right. -- Øistein E. Andersen *) Full title: Lexique des règles typographiques en usage à l’Imprimerie nationale

[whatwg] The m element [em and strong]

2007-02-08 Thread Øistein E . Andersen
f emphasis) are used for a vast variety of different purposes, and we cannot possibly devise a different element for each (which would be the only truly semantic solution). My main point was and remains that importance and emphasis are intimately related. Therefore, defining as denoting importance and pretending that the two are completely dissociated entities is unlikely to be productive. -- Øistein E. Andersen

Re: [whatwg] The m element [em and strong]

2007-02-08 Thread Øistein E . Andersen
arbitrary conventions in modern Western typography, is not helpful either. -- Øistein E. Andersen

Re: [whatwg] Hyphenation

2007-01-11 Thread Øistein E . Andersen
nimally extended version of the TeX algorithm can deal with irregular hyphenation without any extraneous mark-up, i.e., without any unnecessary burden on the author. Perhaps an idea for Prince7? Anyway, the preliminary conclusion seems to be that a element in HTML is unnecessary, so this discussion should probably continue somewhere else. [1] http://www.fi.muni.cz/usr/sojka/papers/tug95.pdf -- Øistein E. Andersen

Re: [whatwg] Hyphenation

2007-01-11 Thread Øistein E . Andersen
s (which is better than Plain TeX's default [unaccented English letters only], but less flexible) and uses a special rule to treat hyphens. Is this a correct assumption? Can I find more information on such details somewhere? -- Øistein E. Andersen

Re: [whatwg] Hyphenation

2007-01-11 Thread Øistein E . Andersen
e: > This format exists. It was pioneered by TeX and is now widely used by > other applications. You seem to be referring to TeX's hyphenation patterns, which are only one (important) part of TeX's hyphenation system. The missing parts need to be defined somehow, and a certain generalisation would be welcome, as discussed above. -- Øistein E. Andersen

Re: [whatwg] Hyphenation [Correction concerning Opera]

2007-01-11 Thread Øistein E . Andersen
y only affects the Macintosh platform. With newer builds and on other platforms, Opera handles ­ just as correctly as Safari and IE do, and I clearly should have checked this before posting. Sorry for the misinformation. -- Øistein E. Andersen

[whatwg] Hyphenation

2007-01-08 Thread Øistein E . Andersen
how, and explicit markup seems to be unavoidable at least in some cases. I hope this can lead to a fruitful discussion. -- Øistein E. Andersen

[whatwg] Editorial: typo (spelling)

2007-01-07 Thread Øistein E . Andersen
In section 2.5.2. Dynamic markup insertion in HTML, in the paragraph `Escaping a string', the word `occurrences' is systematically misspelt with -ra- instead of -rre-. -- Øistein E. Andersen

Re: [whatwg] The parsing section doesn't require "HTML" in uppercase (doctype)

2006-12-04 Thread Øistein E . Andersen
On Tue, 5 Dec 2006, Simon Pieters wrote: >>AFAICT, the parsing section doesn't require uppercase "HTML" On 5 Dec 2006, at 2:2AM, Ian Hickson wrote: > Search for "If the name of the DOCTYPE token is exactly the four letters > "HTML", then mark the token as being correct. Otherwise, mark it as be

Re: [whatwg] Probable typo in section 5.2.2.

2006-12-03 Thread Øistein E . Andersen
On 3 Dec 2006, at 11:7PM, Ian Hickson wrote: > On Sat, 2 Dec 2006, Øistein E. Andersen wrote: >> No one ever replied to this, and the draft remains unchanged [on this point]. > > I will reply to this in due course. [...] > I can't necessarily do it all in real time! :-) Of course not; I did not

Re: [whatwg] Probable typo in section 5.2.2.

2006-12-02 Thread Øistein E . Andersen
On 3 Nov 2006, at 9:51PM, Øistein E. Andersen wrote: > In section 5.2.2., `chickenkïwi.soup' (with diaeresis) appears twice [...], > as does `chickenkiwi.soup' (without diaeresis). No one ever replied to this, and the draft remains unchanged. (If this is /not/ a typo, this should probably be po

Re: [whatwg] Allow trailing slash in always-empty HTML5 elements?

2006-11-30 Thread Øistein E . Andersen
Trailing slashes in void elements are clearly unnecessary from a syntactic point of view, but I think it can be argued that allowing them actually makes HTML more internally consistent. Current versions of HTML allow many unnecessary closing tags to be omitted (e.g., ), and for authors exploiting

Re: [whatwg] Handling of illegal byte-sequences (typically in UTF-8)

2006-11-24 Thread Øistein E . Andersen
On 24 Nov 2006, at 10:33AM, Henri Sivonen wrote: > > On Nov 24, 2006, at 04:11, Øistein E. Andersen wrote: >> >> Section 8.1.4: >>> Bytes [->] U+FFFD >> >> Section 9.2.2: >>> Bytes or sequences of bytes [->] U+FFFD > > I'm inclined to think that interop[erability] in error situations doesn't > ne

[whatwg] Handling of illegal byte-sequences (typically in UTF-8)

2006-11-23 Thread Øistein E . Andersen
Section 8.1.4: > Bytes that are not valid UTF-8 sequences must be interpreted as [...] U+FFFD Section 9.2.2: > Bytes or sequences of bytes [...] that could not be converted to Unicode > characters > must be converted to U+FFFD If I read this correctly, section 8.1.4 requires that an illegal UTF-

Re: [whatwg] Custom elements and attributes

2006-11-06 Thread Øistein E . Andersen
On 5 Nov 2006, at 1:7PM, Lachlan Hunt wrote: > At the very least, ISO-8859-1 must be treated as Windows-1252. I'm not sure > about the other ISO-8859 encodings. Numeric and hex character references from > 128 to 159 must also be treated as Windows-1252 code points. I think this actually implies

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Øistein E . Andersen
On 6 Nov 2006, at 2:53AM, Elliotte Harold wrote: > The URL timed out when I tried to use [Sivonen's] validator You may want to try the following (the results seem to be similar): http://validator.w3.org/check?uri=http%3A%2F%2Fcafe.elharo.com%2Fweb%2Fmokka%2F -- Øistein E. Andersen

[whatwg] Entity parsing

2006-11-05 Thread Øistein E . Andersen
>From section 9.2.3.1. Tokenising entities: > For some entities, UAs require a semicolon, for others they don't. This applies to IE. FWIW, the entities not requiring a semicolon are the ones encoding Latin-1 characters, the other HTML 3.2 entities (&, > and <), as well as " and the uppercase va

Re: [whatwg] Custom elements and attributes

2006-11-04 Thread Øistein E . Andersen
> I think conforming text/html documents should not be allowed to parse into > a DOM that contains characters that are not allowed in XML 1.0. [...] I am > inclined to prefer [...] U+FFFD I perfectly agree. (Actually, i think that U+7F (delete) and the C1 control characters should be excluded [tr

[whatwg] Probable typo in section 5.2.2.

2006-11-03 Thread Øistein E . Andersen
In section 5.2.2., `chickenkïwi.soup' (with diaeresis) appears twice (once encoded as chickenk%C3%AFwi.soup), as does `chickenkiwi.soup' (without diaeresis). -- Øistein E. Andersen

Re: [whatwg] Custom elements and attributes

2006-11-03 Thread Øistein E . Andersen
On 31 Oct 2006, at 11:46AM, Henri Sivonen wrote: > If you add custom *elements* and use the HTML parser, the system does not > ensure that the custom elements would not adversely interact with tag > inference > or error handling in browsers. [...] If you add custom elements, you just > have to >

Re: [whatwg] Dialogue and inline quotations

2006-10-31 Thread Øistein E . Andersen
On 31 Oct 2006, at 9:26PM, Henri Sivonen wrote: > If printed text in French (and other languages) works with the dialog dash > style > without visual hints where you put the and tags, why would an author > want to go though the trouble of tagging the dialog like that and then making > sure > t

Re: [whatwg] Custom elements and attributes

2006-10-30 Thread Øistein E . Andersen
On 23 Oct 2006, at 12:43PM, Henri Sivonen wrote: > Using custom schemas with the HTML parser is for experts only > and produces very wrong results unless the schema is suitable. Indeed so, but then any tool can potentially be misused. Still, I do realise that this is not a priority, of course. >

[whatwg] Custom elements and attributes

2006-10-17 Thread Øistein E . Andersen
Hello, I just tried to check out how custom element and attribute names work in current browsers and how they are supposed to work in HTML5, and some issues seem unclear to me. Given the following fairly minimal document: > > > HTML 5 > > x\:red{ background-color: red; } > x\:bl

Re: [whatwg] Mathematics in HTML5

2006-06-19 Thread Øistein E . Andersen
On 17 Jun 2006, at 2:15PM, White Lynx wrote: >Oistein E. Andersen wrote: >>The current proposal does not seem to include the following elements of >>ISO-12083: >>- with arbitrary delimiters (possibly not a good idea) >Probably it is better to list number of delimiters explicitly like in LaTe

Re: [whatwg] Mathematics in HTML5

2006-06-17 Thread Øistein E . Andersen
On 16 Jun 2006, at 2:27PM, White Lynx wrote: >Oistein E. Andersen wrote: >>The proposal states that should be used to mark resizable operators, >>but this presumably does not mean that the size of such operators is actually intended to change. >It is intended to be larger. Yes, but the si

Re: [whatwg] Mathematics in HTML5

2006-06-16 Thread Øistein E . Andersen
On 14 Jun 2006, at 11:8AM, White Lynx wrote: >Oistein E. Andersen wrote: >>Quotes from "Wikipedia TeX in HTML5" >>http://xn--istein-9xa.com/HTML5/WikiTeX.pdf >>2.5 Big operators >>Remark: Is the following the intended use of under/over and opgrp? >Yes. In fact I would be more appropriate to use

Re: [whatwg] Mathematics in HTML5

2006-06-11 Thread Øistein E . Andersen
[EMAIL PROTECTED] >this may be difficult to achieve in practice, because TeX >conversors reading TeX sources are unable to provide correct MathML markup >for prescripts. Conversion to MathML is obviously more difficult because the base has to be found and encoded explicitly. Still, I do _not_ sa

Re: [whatwg] Mathematics in HTML5

2006-06-11 Thread Øistein E . Andersen
On 10 Jun 2006, at 10:1AM, White Lynx wrote: >Oistein E. Andersen wrote: >>traditional French typographical conventions for mathematics require lowercase >>variables in italic, but uppercase ones in roman. >Do we need extra values like text-transform:french-italic; and >french-bold-italic; >that

Re: [whatwg] Mathematics in HTML5

2006-06-11 Thread Øistein E . Andersen
On 10 Jun 2006, at 10:1AM, White Lynx wrote: >Oistein E. Andersen wrote: >>traditional French typographical conventions for mathematics require lowercase >>variables in italic, but uppercase ones in roman. >Do we need extra values like text-transform:french-italic; and >french-bold-italic; >that

Re: [whatwg] Mathematics in HTML5

2006-06-11 Thread Øistein E . Andersen
On 8 Jun 2006, at 10:3AM, White Lynx wrote: >Can anyone specify what steps should be made to assure this compatibility, As a first step, I have tried to transform the TeX code used on Wikipedia (http://en.wikipedia.org/wiki/Help:Formula) into HTML5. This raises some issues, see http://xn--istein

Re: [whatwg] Mathematics in HTML5

2006-06-09 Thread Øistein E . Andersen
On 9 Jun 2006, at 11:0AM, [EMAIL PROTECTED] wrote: >Øistein E. Andersen wrote: >>2) Fight verbosity >>, [...] 23 and 3125 [are] clearly >>better suited than , 23 and >>3125. >However 23 is an shorthand for the full markup, >because structures of kind {2 \over 3} are even to be avoided in TeX.

Re: [whatwg] Mathematics in HTML5

2006-06-09 Thread Øistein E . Andersen
On 8 Jun 2006, at 10:3AM, White Lynx wrote: >Oistein E. Andersen wrote: >>each mark-up element must be kept as short as possible. >Some people argue that short element names being misleading and not >intuitive does not actually improve readability, some people like short element >names as they ar

  1   2   >