Re: Why do binary files contain text but text files don't contain binary?

Ken Whistler via Unicode Fri, 21 Feb 2020 08:31:22 -0800


On 2/21/2020 7:53 AM, Costello, Roger L. via Unicode wrote:

Text files may indeed contain binary (i.e., bytes that are notinterpretable as characters). Namely, text files may contain newlines,tabs, and some other invisible things.
Question: "characters" are defined as only the visible things, right?

No. You've gone astray right there. Please read Chapter 2 of the UnicodeStandard, and in particular, Section 2.4, Code Points and Characters:


https://www.unicode.org/versions/Unicode12.0.0/ch02.pdf#G25564

All of those types of characters can occur in Unicode plain text. (Withthe exception of surrogate code points.)

I conclude:

Binary files may contain arbitrary text.

Binary files can contain *whatever*, including text.


Text files may contain binary, but only a restricted set of binary.

The distinction is definitional. A text file contains *only* characters,interpretable by a specific character encoding (usually Unicode, thesedays).

But a text file need not be "plain text". An HTML file is an example ofa text file (it contains only a sequence of characters, whose identityand interpretation is all clearly specified by looking them up in theUnicode Standard), but it is not *plain* text. It is *rich* text,consisting of markup tags interspersed with runs of plain text.

Another distinction that may be leading you astray is the distinctionbetween binary file transfer and text file transfer. If you are usingftp, for example, you can specify use of binary file transfer, *even if*the file you are transferring is actually a text file. That simply meansthat the file transfer will agree to treat the entire file as a binaryblob and transfer it byte-for-byte intact. A text file transfer, on theother hand, may look for "lines" in a text file and may adjust lineendings to suit the receiving platform conventions.

Do you agree?

No.

--Ken

Re: Why do binary files contain text but text files don't contain binary?

Reply via email to