And those early versions of Notepad for 16/32-bit Windows were not
even Unicode compliant (the support for Unicode was minimalist, in
fact Unicode was only partly supported on top of the old ANSI/OEM
APIs; without support for the filesystem, and lots of quirks at the
kernel lelevel caused by conver
Steven Atreju wrote:
> Funny that a program that cannot handle files larger than 0x7FFF
> bytes (laste time i've used it, 95B) has such a large impact.
Notepad hasn't had this limitation since Windows Me. That was many, many
years ago.
--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.
Original Message
Date: Wed, 18 Jul 2012 13:45:59 +0200
From: Steven Atreju
To: "Doug Ewell"
Subject: Re: UTF-8 BOM (Re: Charset declaration in HTML)
Doug Ewell wrote:
|For those who haven't yet had enough of this debate yet, here's a link
|to an informa
Hello Doug,
On 2012/07/18 0:35, Doug Ewell wrote:
For those who haven't yet had enough of this debate yet, here's a link
to an informative blog (with some informative comments) from Michael
Kaplan:
"Every character has a story #4: U+feff (alternate title: UTF-8 is the
BOM, dude!)"
http://blogs.
Hello Philippe,
On 2012/07/18 3:37, Philippe Verdy wrote:
2012/7/17 Julian Bradfield:
On 2012-07-16, Philippe Verdy wrote:
I am also convinced that even Shell interpreters on Linux/Unix should
recognize and accept the leading BOM before the hash/bang starting
line (which is commonly used for
2012/7/17 Julian Bradfield :
> On 2012-07-16, Philippe Verdy wrote:
>> I am also convinced that even Shell interpreters on Linux/Unix should
>> recognize and accept the leading BOM before the hash/bang starting
>> line (which is commonly used for filetype identification and runtime
> The kernel do
On 2012-07-16, Philippe Verdy wrote:
> I am also convinced that even Shell interpreters on Linux/Unix should
> recognize and accept the leading BOM before the hash/bang starting
> line (which is commonly used for filetype identification and runtime
> behavior), without claiming that they don"t kno
For those who haven't yet had enough of this debate yet, here's a link
to an informative blog (with some informative comments) from Michael
Kaplan:
"Every character has a story #4: U+feff (alternate title: UTF-8 is the
BOM, dude!)"
http://blogs.msdn.com/b/michkap/archive/2005/01/20/357028.aspx
Wh
Philippe Verdy wrote:
|2012/7/16 Steven Atreju :
|> Fifteen years ago i think i would have put effort in including the
|> BOM after reading this, for complete correctness! I'm pretty sure
|> that i really would have done so.
|
|Fifteen years ago I would not ahave advocated it. Simply becau
Steven Atreju wrote:
> Q: Is the UTF-8 encoding scheme the same irrespective of whether
> the underlying processor is little endian or big endian?
> ...
> Where a BOM is used with UTF-8, it is only used as an ecoding
> signature to distinguish UTF-8 from other encodings — it has
> noth
2012/7/16 Steven Atreju :
> Fifteen years ago i think i would have put effort in including the
> BOM after reading this, for complete correctness! I'm pretty sure
> that i really would have done so.
Fifteen years ago I would not ahave advocated it. Simply because
support of UTF-8 was very poor (a
Steven Atreju, Mon, 16 Jul 2012 13:35:04 +0200:
> "Doug Ewell" wrote:
> And:
>
> Q: Is the UTF-8 encoding scheme the same irrespective of whether
> the underlying processor is little endian or big endian?
> ...
> Where a BOM is used with UTF-8, it is only used as an ecoding
> signature
"Doug Ewell" wrote:
|Steven Atreju wrote:
|
|> If Unicode *defines* that the so-called BOM is in fact a Unicode-
|> indicating tag that MUST be present,
|
|But Unicode does not define that.
Nope. On http://unicode.org/faq/utf_bom.html i read:
Q: Why do some of the UTFs have a BE or LE
Hey, Philippe,
Your input is much appreciated. So, in a nutshell, I don't have to worry.
One of these days I need to crunch down (minify) the CSS and JavaScript
pages. I left them readily readable so that techs like you could easily
read them in place in any browser without having to pretty print.
Steven Atreju wrote:
If Unicode *defines* that the so-called BOM is in fact a Unicode-
indicating tag that MUST be present,
But Unicode does not define that.
I know that, in Germany, many, many small libraries become closed
because there is not enough money available to keep up with the
digi
On Tue, Jul 10, 2012 at 11:58 PM, Leif Halvard Silli <
xn--mlform-...@xn--mlform-iua.no> wrote:
> Naena Guru, Tue, 10 Jul 2012 01:40:19 -0500:
>
> > HTML5 assumes UTF-8 as the character set if you do not declare one
> > explicitly. My current pages are in HTML 4.
>
> There is in principle no diffe
Eli Zaretskii wrote:
|> Date: Fri, 13 Jul 2012 22:07:54 +0200
|> From: Steven Atreju
|> Cc: unicode@unicode.org
|>
|> this time without reply-in-same-charset and
|> encoding=8bit and i bet it comes out as UTF-8 on the other end:
|
|Yes, it does.
..cheer..
Steven
> Date: Fri, 13 Jul 2012 22:07:54 +0200
> From: Steven Atreju
> Cc: unicode@unicode.org
>
> this time without reply-in-same-charset and
> encoding=8bit and i bet it comes out as UTF-8 on the other end:
Yes, it does.
Philippe Verdy wrote:
|2012/7/13 Steven Atreju :
|> Philippe Verdy wrote:
|>
|> |2012/7/12 Steven Atreju :
|> |> UTF-8 is a bytestream, not multioctet(/multisequence).
|> |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
|> |bytes. It has a lot of internal semantic
Eli Zaretskii wrote:
|> For example, this mail is
|> written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8
|> encoding («Schöne Überraschung, gelle?»
|
|No, it isn't:
|
|Content-Type: text/plain; charset=ISO-8859-1
Oh, it's really terrible. I do have 'reply-in-same-charset'
> Date: Fri, 13 Jul 2012 16:04:44 +0200
> From: Steven Atreju
>
> For example, this mail is
> written in an UTF-8 enabled vi(1) basically from 1986, in UTF-8
> encoding («Schöne Überraschung, gelle?»
No, it isn't:
User-Agent: S-nail <12.5 7/5/10;s-nail-9-g517ac44-dirty>
MIME-Version: 1.
2012/7/13 Steven Atreju :
> Philippe Verdy wrote:
>
> |2012/7/12 Steven Atreju :
> |> UTF-8 is a bytestream, not multioctet(/multisequence).
> |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
> |bytes. It has a lot of internal semantics and constraints. Some things
> |are
Philippe Verdy wrote:
|2012/7/12 Steven Atreju :
|> UTF-8 is a bytestream, not multioctet(/multisequence).
|Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
|bytes. It has a lot of internal semantics and constraints. Some things
|are very meaningful, some play absolutely
2012/7/12 Steven Atreju :
> UTF-8 is a bytestream, not multioctet(/multisequence).
Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
bytes. It has a lot of internal semantics and constraints. Some things
are very meaningful, some play absolutely no role at all and could
even be d
Right. Unix was unique when it was created as it was built to handle
all files as unstructured binary files. The history os a lot
different, and text files have always used another paradigm, based n
line records. End of lines initially were not really control
characters. And even today the Unix-sty
Naena Guru, Tue, 10 Jul 2012 01:40:19 -0500:
> As I said, I use HTML-Kit (and Tools).
Your problem appears to be that HTML-Kit does not directly support
UTF-8. But are you aware that you can still work with UTF-8 with it?
You only need to use UnicodePad in the Unicode menu of the Tools menu,
s
Leif Halvard Silli wrote:
|Steven Atreju, Thu, 12 Jul 2012 12:32:46 +0200:
|
|> In the meanwhile the UTF-8 BOM is in the standard and thus
|> contradicts fourty years of (well) good (Unix/POSIX) engineering
|> and craftsmanship. Where a file is a file and everything is a
|> file, holistica
On 2012-07-12, Steven Atreju wrote:
> In the future simple things like '$ cat File1 File2 > File3' will
> no longer work that easily. Currently this works *whatever* file,
> and even program code that has been written more than thirty years
> ago will work correctly. No! You have to modify cont
On Thu, Jul 12, 2012 at 4:06 AM, Leif Halvard Silli
wrote:
> I guess you get the same problem with UTF-16 files also, then?
UTF-16 isn't a text file in the Unix world; it's a binary file. UTF-8
is the only standard Unicode encoding that acts like text to a Unix
system, basically because it was de
Steven Atreju, Thu, 12 Jul 2012 12:32:46 +0200:
> In the meanwhile the UTF-8 BOM is in the standard and thus
> contradicts fourty years of (well) good (Unix/POSIX) engineering
> and craftsmanship. Where a file is a file and everything is a
> file, holistically. Where small tools which do their t
|> As for editors: If your own editor have no problems with the BOM, then
|> what? But I think Notepad can also save as UTF-8 but without the BOM -
|> there should be possible to get an option for choosing when you save
|> it.
|
|Perhaps there should be such an option in Notepad, but there is
Leif Halvard Silli wrote:
As for editors: If your own editor have no problems with the BOM, then
what? But I think Notepad can also save as UTF-8 but without the BOM -
there should be possible to get an option for choosing when you save
it.
Perhaps there should be such an option in Notepad, bu
Philippe Verdy, Wed, 11 Jul 2012 14:15:39 +0200:
> 2012/7/11 Jean-François Colson
>> If your document only contains
>>
>> > header("location:http://unicode.org";);
>> ?>
>>
>> but you save it with a BOM, the BOM will be sent and you’ll get an
>> error message like
>>
>> Warning: Cannot modify
Le 11/07/12 14:15, Philippe Verdy a écrit :
2012/7/11 Jean-François Colson mailto:j...@colson.eu>>
If your document only contains
http://unicode.org";);
?>
but you save it with a BOM, the BOM will be sent and you’ll get an
error message like
Warning: Cannot modify hea
2012/7/11 Jean-François Colson
> If your document only contains
>
> header("location:http://unicode.org";);
> ?>
>
> but you save it with a BOM, the BOM will be sent and you’ll get an error
> message like
>
> Warning: Cannot modify header information - headers already sent by
> (output started
Le 11/07/12 06:32, Philippe Verdy a écrit :
2012/7/10 Naena Guru mailto:naenag...@gmail.com>>
I wanted to see how hard it is to edit a page in Notepad. So I
made a copy of my LIYANNA page and replaced the character entities
I used for Unicode Sinhala, accented Pali and Sanskrit with
Philippe Verdy, Wed, 11 Jul 2012 07:36:56 +0200:
> 2012/7/11 Leif Halvard Silli:
>> In VIM, you set or unset the BOM via the commands
>>
>> set bomb
>> set nobomb
>
> Should these command specify if your computer will explode when saving
> the file ?
>
> :'o
Probably signals the
2012/7/11 Leif Halvard Silli :
> it. Else you can use the free Notepad++. And many others. In VIM, you
> set or unset the BOM via the commands
>
> set bomb
> set nobomb
Should these command specify if your computer will explode when saving
the file ?
:'o
set bom
set nobom
Sorry,
Naena Guru, Tue, 10 Jul 2012 01:40:19 -0500:
> HTML5 assumes UTF-8 as the character set if you do not declare one
> explicitly. My current pages are in HTML 4.
There is in principle no difference between what HTML5-parsers assume
and what HTML4-parsers assume: All of them default to the default
2012/7/10 Naena Guru
> I wanted to see how hard it is to edit a page in Notepad. So I made a copy
> of my LIYANNA page and replaced the character entities I used for Unicode
> Sinhala, accented Pali and Sanskrit with their raw letters. Notepad forced
> me to save the file in UTF-8 format. I ran i
Thank you Otto.
Sorry for delay in replying. I spent the entire Sunday replying Jaques
twins.
You are absolutely right about choice between ISO-8859-1 and UTF-8. I
shouldn't have said 'using ISO-8859-1 is advantageous over UTF-8' It is
efficient if your pages are written in a language that uses s
Hello Naena Guru,
on 2012-07-04, you wrote:
The purpose of
declaring the character set as iso-8859-1 than utf-8 is to avoid doubling
and trebling the size of the page by utf-8. I think, if you have characters
outside iso-8859-1 and declare the page as such, you get
Character-not-found for those
42 matches
Mail list logo