..
The UTF-8, UTF-16, UTF-32 BOM FAQ
http://www.unicode.org/faq/utf_bom.html
has also been updated for clarity,
Very nice, but i wonder why the paragraph on noncharacters can be
found under UTF-16 instead of under some generic, non-Microsoft
specific topic.
Thanks
Steven
|I was trying to follow this:
All that you say, but…
|Karl May for many decades, so it's not like it's a new thing.
For me there is only one drunken criminal, and that is Hemingway
and his famous book, «Captain Iglo and the fish sticks».
|I am not sure I can follow the discourse structure
| Here is a minimal pair to illustrate that point:
| Er hat in Moskau liebe Genossen.
| Er hat in Moskau Liebe genossen.
| which translates to:
| At Moskow, he’s got dear comrades.
| At Moskow, he has enjoyed love.
|
| A classical joke are those two newspaper header lines:
Philippe Verdy verd...@wanadoo.fr wrote:
[.]
|then the real catastrophe occured 394 years ago, in 1618, just because of
|the conquest of America by Spanish troops : which meant a massive death of
|lots of Amerindians (most of them due to imported infections, to which
Terrible and ridiculous
Martin J. Dürst due...@it.aoyama.ac.jp wrote:
|I'm looking for a (preferably online) tool that converts Unicode
|characters to Unicode character names. Richard Ishida's tools
|(http://rishida.net/tools/conversion/) do a lot of conversions, but not
|names.
For whats it worth, that sounds
Jeroen Ruigrok van der Werven asmo...@in-nomine.org wrote:
|For those of you that do research for orthographies and the likes based on
|historical pieces, the Dutch Rijksmuseum has recently launched their
|Rijksstudio. You can search through their entire collection of high
|resolution images
Jeroen Ruigrok van der Werven asmo...@in-nomine.org wrote:
|-On [20121101 11:48], Steven Atreju (snatr...@googlemail.com) wrote:
|Really fantastic. 'Should rework that Flash or what it is since
|i got a halequin statue when i wanted to get a close up of
|Vincent van Gogh..
|
|At least
On Tue, Aug 14, 2012 at 12:48 PM, Karl Pentzlin
karl-pentz...@acssoft.de wrote:
Am Montag, 13. August 2012 um 20:53 schrieb Hans Aberg:
HA The German WP mentions that in the context of the now
HA discontinued Bildschirmtext, it was called Raute:
HA
Hi all,
Philippe Verdy verd...@wanadoo.fr wrote:
|2012/8/13 Otto Stolz otto.st...@uni-konstanz.de:
| Hello,
|
| am 2012-08-13 20:48, schrieb Leif Halvard Silli:
|
| The word 'Raute' reminds of the Norwegian 'rute' - and my Norwegian
| book on etymology assumes that 'rute' is derived from
Leif H Silli xn--mlform-...@xn--mlform-iua.no wrote:
|We now have some data that indicates that what Unicode says about the UTF-8
|BOM is worded in a way that is possible to misunderstand. I support you in
Yeah! Yeah! Yeah!, that is good to read black on #FCFCF9.
|Steven replied:
|
|In
Doug Ewell d...@ewellic.org wrote:
|Steven Atreju wrote:
|
|^Z as an EOF marker for text files was part of the MS-DOS legacy from
|CP/M, where all files were written to a multiple of the disk block size
|(I think 128 for CP/M and 512 for MS-DOS 1.x), and there had to be some
|way to tell
Rick McGowan r...@unicode.org wrote:
|No. That wasn't CP/M... It was a different OS.
Oh yes, according to Wikipedia my remembrance was wrong. Sorry.
Doug Ewell d...@ewellic.org wrote:
|Steven Atreju wrote:
|
| I'm learning in this thread.
| (And CP/M was that thing that Microsoft bought
Leif H Silli xn--mlform-...@xn--mlform-iua.no wrote:
|Steven Atreju on 28/7/'12, 0:22:
| Doug Ewell wrote:
|
| | Well, i still see a bug in the Unicode Standard here.
| | Whereas for the multioctet UTFs there is «The BOM is not
| | considered part of the content of the text
Asmus Freytag asm...@ix.netcom.com wrote:
|On 7/25/2012 2:45 PM, Jukka K. Korpela wrote:
| . One might even argue that the BOM is useful here, too, since it
| immediately signals that there is something wrong, and “” is an
| encoding error signature, so to say.
|
|
|+8
|
|A./
Well,
Leif H Silli xn--mlform-...@xn--mlform-iua.no wrote:
|Asmus Freytag on 26/7/'12, 1:10
| On 7/25/2012 2:45 PM, Jukka K. Korpela wrote:
| . One might even argue that the BOM is useful here, too, since it
| immediately signals that there is something wrong, and “” is an
| encoding error
. The tone was rude.
|Steven Atreju wrote:
|
| Well, i still see a bug in the Unicode Standard here.
| Whereas for the multioctet UTFs there is «The BOM is not
| considered part of the content of the text» (Conformance, 3.10,
| D98, D101), i cannot find any such clarifying text for it's usage
So, dear list, i'm really sorry for this distress.
I don't want to start any thread, but i can't help it and thus
want to pass this through to you.
I had problems with my bicycle and sent a mail asking for help.
This is a real large company (www.mifa.de).
|Received: from
Except that the internet is almost unusable without cookies
and scripting, lynx(1) works very well, too, if the ncursesw
library is linked against (and the terminal font supports
Unicode characters). Funny that it writes garbage for
|htmlbodypä.ü.ö./p/body/html
but uses UTF-8 by default for
Original Message
Date: Wed, 18 Jul 2012 13:45:59 +0200
From: Steven Atreju snatr...@googlemail.com
To: Doug Ewell d...@ewellic.org
Subject: Re: UTF-8 BOM (Re: Charset declaration in HTML)
Doug Ewell wrote:
|For those who haven't yet had enough of this debate yet, here's a link
Philippe Verdy verd...@wanadoo.fr wrote:
|2012/7/16 Steven Atreju snatr...@googlemail.com:
| Fifteen years ago i think i would have put effort in including the
| BOM after reading this, for complete correctness! I'm pretty sure
| that i really would have done so.
|
|Fifteen years ago I
Doug Ewell d...@ewellic.org wrote:
|Steven Atreju wrote:
|
| If Unicode *defines* that the so-called BOM is in fact a Unicode-
| indicating tag that MUST be present,
|
|But Unicode does not define that.
Nope. On http://unicode.org/faq/utf_bom.html i read:
Q: Why do some of the UTFs
Eli Zaretskii e...@gnu.org wrote:
| Date: Fri, 13 Jul 2012 22:07:54 +0200
| From: Steven Atreju snatr...@googlemail.com
| Cc: unicode@unicode.org
|
| this time without reply-in-same-charset and
| encoding=8bit and i bet it comes out as UTF-8 on the other end:
|
|Yes, it does.
..cheer
Philippe Verdy verd...@wanadoo.fr wrote:
|2012/7/12 Steven Atreju snatr...@googlemail.com:
| UTF-8 is a bytestream, not multioctet(/multisequence).
|Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
|bytes. It has a lot of internal semantics and constraints. Some things
Eli Zaretskii e...@gnu.org wrote:
| For example, this mail is
| written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8
| encoding («Schöne Überraschung, gelle?»
|
|No, it isn't:
|
|Content-Type: text/plain; charset=ISO-8859-1
Oh, it's really terrible. I do have
Philippe Verdy verd...@wanadoo.fr wrote:
|2012/7/13 Steven Atreju snatr...@googlemail.com:
| Philippe Verdy verd...@wanadoo.fr wrote:
|
| |2012/7/12 Steven Atreju snatr...@googlemail.com:
| | UTF-8 is a bytestream, not multioctet(/multisequence).
| |Not even. UTF-8 is a text-stream
| As for editors: If your own editor have no problems with the BOM, then
| what? But I think Notepad can also save as UTF-8 but without the BOM -
| there should be possible to get an option for choosing when you save
| it.
|
|Perhaps there should be such an option in Notepad, but there
Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote:
|Steven Atreju, Thu, 12 Jul 2012 12:32:46 +0200:
|
| In the meanwhile the UTF-8 BOM is in the standard and thus
| contradicts fourty years of (well) good (Unix/POSIX) engineering
| and craftsmanship. Where a file is a file
Denis Jacquerye wrote [2012-03-26 13:35+0200]:
The fact [.] doesn't make it any saner.
The same could be said [.]
Denis Moyogo Jacquerye
Are you trying to say that extra tables and exact additional
knowledge besides UnicodeData.txt should not be necessary?
In the end you wanna make it a
28 matches
Mail list logo