And those early versions of Notepad for 16/32-bit Windows were not
even Unicode compliant (the support for Unicode was minimalist, in
fact Unicode was only partly supported on top of the old ANSI/OEM
APIs; without support for the filesystem, and lots of quirks at the
kernel lelevel caused by conver
Steven Atreju wrote:
> Funny that a program that cannot handle files larger than 0x7FFF
> bytes (laste time i've used it, 95B) has such a large impact.
Notepad hasn't had this limitation since Windows Me. That was many, many
years ago.
--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.
Original Message
Date: Wed, 18 Jul 2012 13:45:59 +0200
From: Steven Atreju
To: "Doug Ewell"
Subject: Re: UTF-8 BOM (Re: Charset declaration in HTML)
Doug Ewell wrote:
|For those who haven't yet had enough of this debate yet, here's a link
|to an informa
Hello Doug,
On 2012/07/18 0:35, Doug Ewell wrote:
For those who haven't yet had enough of this debate yet, here's a link
to an informative blog (with some informative comments) from Michael
Kaplan:
"Every character has a story #4: U+feff (alternate title: UTF-8 is the
BOM, dude!)"
http://blogs.
Hello Philippe,
On 2012/07/18 3:37, Philippe Verdy wrote:
2012/7/17 Julian Bradfield:
On 2012-07-16, Philippe Verdy wrote:
I am also convinced that even Shell interpreters on Linux/Unix should
recognize and accept the leading BOM before the hash/bang starting
line (which is commonly used for
2012/7/17 Julian Bradfield :
> On 2012-07-16, Philippe Verdy wrote:
>> I am also convinced that even Shell interpreters on Linux/Unix should
>> recognize and accept the leading BOM before the hash/bang starting
>> line (which is commonly used for filetype identification and runtime
> The kernel do
On 2012-07-16, Philippe Verdy wrote:
> I am also convinced that even Shell interpreters on Linux/Unix should
> recognize and accept the leading BOM before the hash/bang starting
> line (which is commonly used for filetype identification and runtime
> behavior), without claiming that they don"t kno
For those who haven't yet had enough of this debate yet, here's a link
to an informative blog (with some informative comments) from Michael
Kaplan:
"Every character has a story #4: U+feff (alternate title: UTF-8 is the
BOM, dude!)"
http://blogs.msdn.com/b/michkap/archive/2005/01/20/357028.aspx
Wh
Philippe Verdy wrote:
|2012/7/16 Steven Atreju :
|> Fifteen years ago i think i would have put effort in including the
|> BOM after reading this, for complete correctness! I'm pretty sure
|> that i really would have done so.
|
|Fifteen years ago I would not ahave advocated it. Simply becau
Steven Atreju wrote:
> Q: Is the UTF-8 encoding scheme the same irrespective of whether
> the underlying processor is little endian or big endian?
> ...
> Where a BOM is used with UTF-8, it is only used as an ecoding
> signature to distinguish UTF-8 from other encodings — it has
> noth
2012/7/16 Steven Atreju :
> Fifteen years ago i think i would have put effort in including the
> BOM after reading this, for complete correctness! I'm pretty sure
> that i really would have done so.
Fifteen years ago I would not ahave advocated it. Simply because
support of UTF-8 was very poor (a
Steven Atreju, Mon, 16 Jul 2012 13:35:04 +0200:
> "Doug Ewell" wrote:
> And:
>
> Q: Is the UTF-8 encoding scheme the same irrespective of whether
> the underlying processor is little endian or big endian?
> ...
> Where a BOM is used with UTF-8, it is only used as an ecoding
> signature
"Doug Ewell" wrote:
|Steven Atreju wrote:
|
|> If Unicode *defines* that the so-called BOM is in fact a Unicode-
|> indicating tag that MUST be present,
|
|But Unicode does not define that.
Nope. On http://unicode.org/faq/utf_bom.html i read:
Q: Why do some of the UTFs have a BE or LE
Steven Atreju wrote:
If Unicode *defines* that the so-called BOM is in fact a Unicode-
indicating tag that MUST be present,
But Unicode does not define that.
I know that, in Germany, many, many small libraries become closed
because there is not enough money available to keep up with the
digi
Eli Zaretskii wrote:
|> Date: Fri, 13 Jul 2012 22:07:54 +0200
|> From: Steven Atreju
|> Cc: unicode@unicode.org
|>
|> this time without reply-in-same-charset and
|> encoding=8bit and i bet it comes out as UTF-8 on the other end:
|
|Yes, it does.
..cheer..
Steven
> Date: Fri, 13 Jul 2012 22:07:54 +0200
> From: Steven Atreju
> Cc: unicode@unicode.org
>
> this time without reply-in-same-charset and
> encoding=8bit and i bet it comes out as UTF-8 on the other end:
Yes, it does.
Philippe Verdy wrote:
|2012/7/13 Steven Atreju :
|> Philippe Verdy wrote:
|>
|> |2012/7/12 Steven Atreju :
|> |> UTF-8 is a bytestream, not multioctet(/multisequence).
|> |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
|> |bytes. It has a lot of internal semantic
Eli Zaretskii wrote:
|> For example, this mail is
|> written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8
|> encoding («Schöne Überraschung, gelle?»
|
|No, it isn't:
|
|Content-Type: text/plain; charset=ISO-8859-1
Oh, it's really terrible. I do have 'reply-in-same-charset'
> Date: Fri, 13 Jul 2012 16:04:44 +0200
> From: Steven Atreju
>
> For example, this mail is
> written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8
> encoding («Schöne Überraschung, gelle?»
No, it isn't:
User-Agent: S-nail <12.5 7/5/10;s-nail-9-g517ac44-dirty>
MIME-Version: 1.
2012/7/13 Steven Atreju :
> Philippe Verdy wrote:
>
> |2012/7/12 Steven Atreju :
> |> UTF-8 is a bytestream, not multioctet(/multisequence).
> |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
> |bytes. It has a lot of internal semantics and constraints. Some things
> |are
Philippe Verdy wrote:
|2012/7/12 Steven Atreju :
|> UTF-8 is a bytestream, not multioctet(/multisequence).
|Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
|bytes. It has a lot of internal semantics and constraints. Some things
|are very meaningful, some play absolutely
2012/7/12 Steven Atreju :
> UTF-8 is a bytestream, not multioctet(/multisequence).
Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
bytes. It has a lot of internal semantics and constraints. Some things
are very meaningful, some play absolutely no role at all and could
even be d
Right. Unix was unique when it was created as it was built to handle
all files as unstructured binary files. The history os a lot
different, and text files have always used another paradigm, based n
line records. End of lines initially were not really control
characters. And even today the Unix-sty
Leif Halvard Silli wrote:
|Steven Atreju, Thu, 12 Jul 2012 12:32:46 +0200:
|
|> In the meanwhile the UTF-8 BOM is in the standard and thus
|> contradicts fourty years of (well) good (Unix/POSIX) engineering
|> and craftsmanship. Where a file is a file and everything is a
|> file, holistica
On 2012-07-12, Steven Atreju wrote:
> In the future simple things like '$ cat File1 File2 > File3' will
> no longer work that easily. Currently this works *whatever* file,
> and even program code that has been written more than thirty years
> ago will work correctly. No! You have to modify cont
On Thu, Jul 12, 2012 at 4:06 AM, Leif Halvard Silli
wrote:
> I guess you get the same problem with UTF-16 files also, then?
UTF-16 isn't a text file in the Unix world; it's a binary file. UTF-8
is the only standard Unicode encoding that acts like text to a Unix
system, basically because it was de
Steven Atreju, Thu, 12 Jul 2012 12:32:46 +0200:
> In the meanwhile the UTF-8 BOM is in the standard and thus
> contradicts fourty years of (well) good (Unix/POSIX) engineering
> and craftsmanship. Where a file is a file and everything is a
> file, holistically. Where small tools which do their t
|> As for editors: If your own editor have no problems with the BOM, then
|> what? But I think Notepad can also save as UTF-8 but without the BOM -
|> there should be possible to get an option for choosing when you save
|> it.
|
|Perhaps there should be such an option in Notepad, but there is
28 matches
Mail list logo