Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-16 Thread Steven Atreju
Doug Ewell d...@ewellic.org wrote: |Steven Atreju wrote: | | If Unicode *defines* that the so-called BOM is in fact a Unicode- | indicating tag that MUST be present, | |But Unicode does not define that. Nope. On http://unicode.org/faq/utf_bom.html i read: Q: Why do some of the UTFs

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-16 Thread Leif Halvard Silli
Steven Atreju, Mon, 16 Jul 2012 13:35:04 +0200: Doug Ewell d...@ewellic.org wrote: And: Q: Is the UTF-8 encoding scheme the same irrespective of whether the underlying processor is little endian or big endian? ... Where a BOM is used with UTF-8, it is only used as an ecoding

RE: pre-HTML5 and the BOM

2012-07-16 Thread Doug Ewell
Leif Halvard Silli xn dash dash mlform dash iua at xn dash dash mlform dash iua dot no wrote: So, in a way, the ZWNBSP - or any other non-ASCII character (it would in fact be better to use U+200B, to reserve the U+FEFF for its designated BOM purpose) could serve as a UTF-8 sniff character not

Re: pre-HTML5 and the BOM

2012-07-16 Thread Leif Halvard Silli
Doug Ewell, Sat, 14 Jul 2012 15:14:10 -0600: Philippe Verdy wrote: It would break if the only place where to place a BOM is just the start of a file. But as I propose, we allow BOMs to occur anywhere to specify which encoding to use to decode what follows each one, even shell scripts would

Copyleft

2012-07-16 Thread Jean-François Colson
Recently, the Canadian symbols  (marque de commerce) and  (marque déposée) have been added to Unicode at U+1F16A and U+1F16B. Would it be possible to add the copyleft symbol in the neighbourhood ? It looks like a reversed ©. Today, to type it, I use a reversed c with a combining enclosing

Re: pre-HTML5 and the BOM

2012-07-16 Thread Jean-François Colson
Le 14/07/12 23:14, Doug Ewell a écrit : A related question, though, is why some people think the sky will fall if a text file contains loose zero-width no-break spaces. U+FEFF is the very model of a default ignorable code point. I don’t think the sky will fall but I say there still are a few

Re: Copyleft

2012-07-16 Thread Leo Broukhis
Ↄ⃝ may be a better approximation. Leo On Mon, Jul 16, 2012 at 10:47 AM, Jean-François Colson j...@colson.eu wrote: Recently, the Canadian symbols  (marque de commerce) and  (marque déposée) have been added to Unicode at U+1F16A and U+1F16B. Would it be possible to add the copyleft symbol

RE: Copyleft

2012-07-16 Thread Doug Ewell
There was a discussion on this list around May 2000 regarding the so-called copyleft symbol. There were concerns that it was not really a symbol with legal standing, like © and ® and ™, but more of a logo, notably one worn on T-shirts by followers of a sort of social movement. Eventually it was

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-16 Thread Doug Ewell
Steven Atreju wrote: Q: Is the UTF-8 encoding scheme the same irrespective of whether the underlying processor is little endian or big endian? ... Where a BOM is used with UTF-8, it is only used as an ecoding signature to distinguish UTF-8 from other encodings — it has nothing

Re: pre-HTML5 and the BOM

2012-07-16 Thread Philippe Verdy
2012/7/16 Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no: html element, then Chrome will sniff it as UTF-8 encoded. Whereas IE, Webkit, Opera, Firefox will default to ISO-8858-1/Windows-1252. Actually ISO 885**9**-1. But we've also been told that, given the C1 controls are simply invalid

Re: pre-HTML5 and the BOM

2012-07-16 Thread Philippe Verdy
2012/7/15 David Starner prosfil...@gmail.com: /tmp $ echo -n a file1 /tmp $ echo b file2 /tmp $ cat file1 file2 file3 /tmp $ echo ab | diff -q - file3 Once again the problem is the /bin/cat tool which is used for everything and agnostic about preserving text selantics. using another cat