Re: Corrigendum #9 clarifies noncharacter usage in Unicode

2013-02-21 Thread Steven Atreju
.. The UTF-8, UTF-16, UTF-32 BOM FAQ http://www.unicode.org/faq/utf_bom.html has also been updated for clarity, Very nice, but i wonder why the paragraph on noncharacters can be found under UTF-16 instead of under some generic, non-Microsoft specific topic. Thanks Steven

Re: Capitalization in German

2013-02-21 Thread Steven Atreju
|I was trying to follow this: All that you say, but… |Karl May for many decades, so it's not like it's a new thing. For me there is only one drunken criminal, and that is Hemingway and his famous book, «Captain Iglo and the fish sticks». |I am not sure I can follow the discourse structure

Re: Capitalization in German

2013-02-20 Thread Steven Atreju
| Here is a minimal pair to illustrate that point: | Er hat in Moskau liebe Genossen. | Er hat in Moskau Liebe genossen. | which translates to: | At Moskow, he’s got dear comrades. | At Moskow, he has enjoyed love. | | A classical joke are those two newspaper header lines:

Re: I missed my self-imposed deadline for the Mayan numeral proposal

2012-12-25 Thread Steven Atreju
Philippe Verdy verd...@wanadoo.fr wrote: [.] |then the real catastrophe occured 394 years ago, in 1618, just because of |the conquest of America by Spanish troops : which meant a massive death of |lots of Amerindians (most of them due to imported infections, to which Terrible and ridiculous

Re: Tool to convert characters to character names

2012-12-20 Thread Steven Atreju
Martin J. Dürst due...@it.aoyama.ac.jp wrote: |I'm looking for a (preferably online) tool that converts Unicode |characters to Unicode character names. Richard Ishida's tools |(http://rishida.net/tools/conversion/) do a lot of conversions, but not |names. For whats it worth, that sounds

Re: Rijksmuseum launches Rijksstudio

2012-11-01 Thread Steven Atreju
Jeroen Ruigrok van der Werven asmo...@in-nomine.org wrote: |For those of you that do research for orthographies and the likes based on |historical pieces, the Dutch Rijksmuseum has recently launched their |Rijksstudio. You can search through their entire collection of high |resolution images

Re: Rijksmuseum launches Rijksstudio

2012-11-01 Thread Steven Atreju
Jeroen Ruigrok van der Werven asmo...@in-nomine.org wrote: |-On [20121101 11:48], Steven Atreju (snatr...@googlemail.com) wrote: |Really fantastic. 'Should rework that Flash or what it is since |i got a halequin statue when i wanted to get a close up of |Vincent van Gogh.. | |At least

Re: U+25CA LOZENGE - why is it in the Mac OS Roman character set (and therefore widespread in current fonts)?

2012-08-15 Thread Steven Atreju
On Tue, Aug 14, 2012 at 12:48 PM, Karl Pentzlin karl-pentz...@acssoft.de wrote: Am Montag, 13. August 2012 um 20:53 schrieb Hans Aberg: HA The German WP mentions that in the context of the now HA discontinued Bildschirmtext, it was called Raute: HA

Re: German »Raute« (was: U+25CA LOZENGE)

2012-08-14 Thread Steven Atreju
Hi all, Philippe Verdy verd...@wanadoo.fr wrote: |2012/8/13 Otto Stolz otto.st...@uni-konstanz.de: | Hello, | | am 2012-08-13 20:48, schrieb Leif Halvard Silli: | | The word 'Raute' reminds of the Norwegian 'rute' - and my Norwegian | book on etymology assumes that 'rute' is derived from

Re: (Informational only: UTF-8 BOM and the real life)

2012-07-30 Thread Steven Atreju
Leif H Silli xn--mlform-...@xn--mlform-iua.no wrote: |We now have some data that indicates that what Unicode says about the UTF-8 |BOM is worded in a way that is possible to misunderstand. I support you in Yeah! Yeah! Yeah!, that is good to read black on #FCFCF9. |Steven replied: | |In

Re: (Informational only: UTF-8 BOM and the real life)

2012-07-30 Thread Steven Atreju
Doug Ewell d...@ewellic.org wrote: |Steven Atreju wrote: | |^Z as an EOF marker for text files was part of the MS-DOS legacy from |CP/M, where all files were written to a multiple of the disk block size |(I think 128 for CP/M and 512 for MS-DOS 1.x), and there had to be some |way to tell

Re: (Informational only: UTF-8 BOM and the real life)

2012-07-30 Thread Steven Atreju
Rick McGowan r...@unicode.org wrote: |No. That wasn't CP/M... It was a different OS. Oh yes, according to Wikipedia my remembrance was wrong. Sorry. Doug Ewell d...@ewellic.org wrote: |Steven Atreju wrote: | | I'm learning in this thread. | (And CP/M was that thing that Microsoft bought

Re: (Informational only: UTF-8 BOM and the real life)

2012-07-28 Thread Steven Atreju
Leif H Silli xn--mlform-...@xn--mlform-iua.no wrote: |Steven Atreju on 28/7/'12, 0:22: | Doug Ewell wrote: | | | Well, i still see a bug in the Unicode Standard here. | | Whereas for the multioctet UTFs there is «The BOM is not | | considered part of the content of the text

Re: (Informational only: UTF-8 BOM and the real life)

2012-07-27 Thread Steven Atreju
Asmus Freytag asm...@ix.netcom.com wrote: |On 7/25/2012 2:45 PM, Jukka K. Korpela wrote: | . One might even argue that the BOM is useful here, too, since it | immediately signals that there is something wrong, and “” is an | encoding error signature, so to say. | | |+8 | |A./ Well,

Re: (Informational only: UTF-8 BOM and the real life)

2012-07-27 Thread Steven Atreju
Leif H Silli xn--mlform-...@xn--mlform-iua.no wrote: |Asmus Freytag on 26/7/'12, 1:10 | On 7/25/2012 2:45 PM, Jukka K. Korpela wrote: | . One might even argue that the BOM is useful here, too, since it | immediately signals that there is something wrong, and “” is an | encoding error

Re: (Informational only: UTF-8 BOM and the real life)

2012-07-27 Thread Steven Atreju
. The tone was rude. |Steven Atreju wrote: | | Well, i still see a bug in the Unicode Standard here. | Whereas for the multioctet UTFs there is «The BOM is not | considered part of the content of the text» (Conformance, 3.10, | D98, D101), i cannot find any such clarifying text for it's usage

(Informational only: UTF-8 BOM and the real life)

2012-07-25 Thread Steven Atreju
So, dear list, i'm really sorry for this distress. I don't want to start any thread, but i can't help it and thus want to pass this through to you. I had problems with my bicycle and sent a mail asking for help. This is a real large company (www.mifa.de). |Received: from

Re: pre-HTML5 and the BOM

2012-07-18 Thread Steven Atreju
Except that the internet is almost unusable without cookies and scripting, lynx(1) works very well, too, if the ncursesw library is linked against (and the terminal font supports Unicode characters). Funny that it writes garbage for |htmlbodypä.ü.ö./p/body/html but uses UTF-8 by default for

Fwd: Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-18 Thread Steven Atreju
Original Message Date: Wed, 18 Jul 2012 13:45:59 +0200 From: Steven Atreju snatr...@googlemail.com To: Doug Ewell d...@ewellic.org Subject: Re: UTF-8 BOM (Re: Charset declaration in HTML) Doug Ewell wrote: |For those who haven't yet had enough of this debate yet, here's a link

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-17 Thread Steven Atreju
Philippe Verdy verd...@wanadoo.fr wrote: |2012/7/16 Steven Atreju snatr...@googlemail.com: | Fifteen years ago i think i would have put effort in including the | BOM after reading this, for complete correctness! I'm pretty sure | that i really would have done so. | |Fifteen years ago I

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-16 Thread Steven Atreju
Doug Ewell d...@ewellic.org wrote: |Steven Atreju wrote: | | If Unicode *defines* that the so-called BOM is in fact a Unicode- | indicating tag that MUST be present, | |But Unicode does not define that. Nope. On http://unicode.org/faq/utf_bom.html i read: Q: Why do some of the UTFs

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-14 Thread Steven Atreju
Eli Zaretskii e...@gnu.org wrote: | Date: Fri, 13 Jul 2012 22:07:54 +0200 | From: Steven Atreju snatr...@googlemail.com | Cc: unicode@unicode.org | | this time without reply-in-same-charset and | encoding=8bit and i bet it comes out as UTF-8 on the other end: | |Yes, it does. ..cheer

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-13 Thread Steven Atreju
Philippe Verdy verd...@wanadoo.fr wrote: |2012/7/12 Steven Atreju snatr...@googlemail.com: | UTF-8 is a bytestream, not multioctet(/multisequence). |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of |bytes. It has a lot of internal semantics and constraints. Some things

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-13 Thread Steven Atreju
Eli Zaretskii e...@gnu.org wrote: | For example, this mail is | written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8 | encoding («Schöne Überraschung, gelle?» | |No, it isn't: | |Content-Type: text/plain; charset=ISO-8859-1 Oh, it's really terrible. I do have

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-13 Thread Steven Atreju
Philippe Verdy verd...@wanadoo.fr wrote: |2012/7/13 Steven Atreju snatr...@googlemail.com: | Philippe Verdy verd...@wanadoo.fr wrote: | | |2012/7/12 Steven Atreju snatr...@googlemail.com: | | UTF-8 is a bytestream, not multioctet(/multisequence). | |Not even. UTF-8 is a text-stream

UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-12 Thread Steven Atreju
| As for editors: If your own editor have no problems with the BOM, then | what? But I think Notepad can also save as UTF-8 but without the BOM - | there should be possible to get an option for choosing when you save | it. | |Perhaps there should be such an option in Notepad, but there

Re: UTF-8 BOM (Re: Charset declaration in HTML)

2012-07-12 Thread Steven Atreju
Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote: |Steven Atreju, Thu, 12 Jul 2012 12:32:46 +0200: | | In the meanwhile the UTF-8 BOM is in the standard and thus | contradicts fourty years of (well) good (Unix/POSIX) engineering | and craftsmanship. Where a file is a file

Re: Combining latin small letters with diacritics

2012-03-26 Thread Steven Atreju
Denis Jacquerye wrote [2012-03-26 13:35+0200]: The fact [.] doesn't make it any saner. The same could be said [.] Denis Moyogo Jacquerye Are you trying to say that extra tables and exact additional knowledge besides UnicodeData.txt should not be necessary? In the end you wanna make it a