Hello again everyone, Though I initially took the shoo-away, there have been some comments made since then that I feel compelled to rebut. To avoid spamming the list, I’ve combined my responses into a single message.
Before that, I will say, again, for the record: I know this NOOP idea is unlikely to ever happen. Certainly not with the responses I've gotten. I haven't submitted it, nor even looked into how to. I know it would be rejected. This is a thought experiment, nothing more. If that doesn't interest you, please disregard this message. And again, the hypothetical NOOP is a character whose canonical equivalent is the absence of a character. The logical consequences of that statement apply fully. On Wed, Jul 3, 2019 at 8:00 PM Shawn Steele via Unicode <unicode@unicode.org> wrote: > > Even more complicated is that, as pointed out by others, it's pretty much > impossible to say "these n codepoints should be ignored and have no meaning" > because some process would try to use codepoints 1-3 for some private > meaning. Another would use codepoint 1 for their own thing, and there'd be a > conflict. This is so utterly, completely, and severely missing the point I'm starting to feel like a madman screaming to the heavens, "Why can't they just understand?!" Yes, a different process will have a different private meaning for the codepoint. That is not a bug, it is a feature. A conflict is always resolved by the current process saying, "I'm holding the string now. The old NOOPs are gone, canonically decomposed to nothing. The new ones mean what I want them to mean, as long as I or my buddies hold the string. If you didn't want that, you shouldn't have given the string to me!" This conflict-resolution mechanism is the special sauce. If a process needs a private marker that will be preserved in interchange, there are plenty of PUA characters to use, and even a couple of private control characters. > I also think that the conversation has pretty much proven that such a system > is mathematically impossible. (You can't have a "private" no-meaning > codepoint that won't conflict with other "private" uses in a public space). No such thing has been proven in the slightest. Any conflict is resolved, in the default case, by normalizing all NOOPs to nothing. On Wed, Jul 3, 2019 at 5:46 PM Mark E. Shoulson via Unicode <unicode@unicode.org> wrote: > > Um... How could you be sure that process X would get the no-ops that process > W wrote? After all, it's *discardable*, like you said, and the database > programs and libraries aren't in on the secret. Yes, there is a requirement that W and X communicate via some "NOOP-preserving path" (call it a NOOPPP). Such paths would generally be very short and direct, because NOOPs are intended to be ephemeral, not archival! They wouldn't be hard to come by. Memory mappings or pipes. Direct inter-process comms. Anything that operates at byte-level. Even simple persisting mechanisms like file storage or databases can preserve NOOP by doing... nothing. "Discardable" doesn't mean it must be discarded, merely that it can be. Where there are no security implications or other need, strings containing NOOP can simply be passed through and stored as-is. Where any interface, library, or process does not preserve NOOP, it cannot be part of a NOOPPP. Tough luck. > Moreover, as you say, what about when Process Z (or its companions) comes > along and is using THE SAME MECHANISM for something utterly different? How > does it know that process W wasn't writing no-ops for it, but was writing > them for Process X? It is the responsibility of Process Z (and any process that interprets NOOPs non-trivially) to be aware of the context/source of what it's receiving. Prior agreement or advertised contract. On Wed, Jul 3, 2019 at 2:06 PM Rebecca Bettencourt <beckie...@gmail.com> wrote: > > And the database driver filters out the U+000F completely as a matter of best > practice and security-in-depth. I'm struggling to see the security implication of "store this string, verbatim, in your regular VARCHAR (or whatever) text field". I can store the string "DROP TABLE [STUDENTS];" in a text field and unless the database is horribly broken it will store that without issue. A database could strip out NOOP out of text fields and still claim to be Unicode conformant. But I wonder why it would bother to do that. And even then, you could just store the string in a VARBINARY field or whatever just accepts bytes. > You can't say "this character should be ignored everywhere" and "this > character should be preserved everywhere" at the same time. That's the > contradiction. I have not said "this character should be preserved everywhere". That statement is completely false. Unfortunately, that means what I said is still not being understood at all. Forgive me for being frustrated. Finally, a general comment: I think people are getting hung-up on this idea because they’re still thinking in terms of what is being guaranteed, while this is explicitly about an inversion of that concept. Not a guarantee, but a disclaimer. I called it an “ephemeral private sentinel” because that name captures what it is. It’s not for archiving or interchange, except for extremely short and direct cases under special conditions. Most objections I’ve gotten so far arise out of misunderstanding and attempts to force normal character behaviour on it. I can take criticism, but not when it’s based on a completely false premise. Define a character that is canonically equivalent to the absence of a character. Make it so a conforming receiving process able to purge it whenever convenient. That's not hard to implement, especially in relation to other existing requirements. But would it be useful? I claim it would be very useful indeed. Many things that can be done with ordinary characters will not be possible with this one. That's fine. Other things will be possible. This idea isn’t really dissimilar to the original intended meanings of SYN or NUL or DEL, or for that matter to Unicode noncharacters. In fact if the standard had enforced purging noncharacters during interchange (instead of vacillating about their illegality before currently recommending they be preserved or at least U+FFFDed) we’d already be 99% of the way to what I suggested. The ideal opportunity to define this behaviour (for a single code point or a set) was almost three decades ago, but it definitely could have been done, and it would not have been expensive. I just hold onto this idea for that day I get a time machine.