On Fri, Mar 20, 2020 at 5:48 AM Adam Borowski via Unicode < unicode@unicode.org> wrote:
> On Fri, Mar 20, 2020 at 12:21:26PM +0000, Costello, Roger L. via Unicode > wrote: > > [Definition] Property: an attribute, quality, or characteristic of > something. > > > > JPEG is a binary data format. > > CSV is a text data format. > > > > Question #1: Is the binaryness/textness of a data format a property? > > > > Question #2: If the answer to Question #1 is yes, then what is the name > of > > this binaryness/textness property? > > I'm afraid this question is too fuzzy to have a proper answer. > > For example, most Unix-heads will tell you that UTF16LE is a binary rather > than text format. Microsoft employees and some members of this list will > disagree. > > Then you have Postscript -- nothing but basic ASCII, yet utterly unreadable > for a (sane) human. > > If you want _my_ definition of a file being _technically_ text, it's: > * no bytes 0..31 other than newlines and tabs (even form feeds are out > nowadays) > * correctly encoded for the expected charset (and nowadays, if that's not > UTF-8 Unicode, you're doing it wrong) > * no invalid characters > Just a minor note... In the case of UTF8, this means no bytes 0xF8-0xFF will ever be used; every valid utf8 codeunit has at least 1 bit off. I wouldn't be so picky about 'no bytes 0-31' because \t, \n, \x1b(ANSI codes) are all quite usable... > > But besides this narrow technical meaning -- is a Word document "text"? > And if it is, why not Powerpoint? This all falls apart. > > > Meow! > -- > ⢀⣴⠾⠻⢶⣦⠀ > ⣾⠁⢠⠒⠀⣿⡁ in the beginning was the boot and root floppies and they were good. > ⢿⡄⠘⠷⠚⠋⠀ -- <willmore> on #linux-sunxi > ⠈⠳⣄⠀⠀⠀⠀ >