But there's nothing wrong with proposing a higher-level protocol; indeed, that's what Ken Whistler was saying: you need a protocol to transmit  this information.  It's metadata, so it will perforce be a higher-level protocol of some kind, whether transmitting actually out-of-band or reserving a piece of the file for metadata.  That's fine.  I'm not sure what the advantage is of using circled characters instead of plain old ascii.  You have to set off your reserved area somehow, and I don't think using circled chars is the least obtrusive way to do it.  You could use XML; that would be pretty well-suited to the task, but maybe it's overkill.  If all you need is to reference some "standard" PUA interpretation (per James Kass' take on this, not William Overington's), then just a header like "[PUA00001]" would work just fine.  (Compare emacs with things like "-*- encoding: utf-8 -*-" or whatever.)

For larger chunks of meta-info, XML might be a good choice, but even then, it could be an XML *header* to an otherwise ordinary text file.  Yes, you'd have to delimit it somehow, and probably have a top header (a "magic number") to signal the protocol, but that's doable.  For applications not supporting this protocol, such a setup is probably easier for the eye to skip past (even if it's long) than a bunch of circled letters.

A protocol like that is outside of Unicode's scope (just like XML is), but it's certainly something you could write up and try to standardize and get used, with or without the support of ISO. People are coming up with file formats all the time (and if you really want to used circled characters, go ahead.  That's something for you to consider in the design phase of the project).

~mark


On 08/27/2018 05:20 PM, Rebecca Bettencourt via Unicode wrote:

            > That sounds like a non-conformant use of characters in
            the U+24xx block.

            Well, you are an expert on these things and I do not
            understand as to with what it would be non-conformant.


A conformant process must interpret ⓅⓊⒶⒹⒶⓉⒶ as the characters ⓅⓊⒶⒹⒶⓉⒶ and not as a signal to process what follows as anything other than plain text.

What you are proposing is a higher-level protocol, whether you realize it or not. Unfortunately your higher-level protocol has a serious flaw in that it cannot represent the string "ⓅⓊⒶⒹⒶⓉⒶ". Also, seeing a bunch of circled alphanumeric characters in a document ⓘⓢ◯ⓕⓐⓡ◯ⓕⓡⓞⓜ◯ⓤⓝⓞⓑⓣⓡⓤⓢⓘⓥⓔ.

There are plenty of already-existing higher-level protocols (you mentioned one: XML) that could be used to provide information about PUA characters, and they are all much better suited to that purpose than what you are proposing.


Reply via email to