Tex asked: > But does the standard address their removal by receivers (or > intermediaries) , and does removing them include removing the contained > annotation?
Yes and yes. p. 326: "On input, a plain text receiver should either preserve all characters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ or remove the interlinear annotation characters as well as the annotating ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ text..." ^^^^ > > I can imagine an application that doesn't support I.A. deciding the > annotation is out of band and can't be preserved in its plain text > output, and so justifiably strips it as well. > Does the standard say what to do with "for internal use" only > characters? Yes. Unicode 3.1: D7b: Noncharacter: a code point that is permanently reserved for internal use, and that should never be interchanged. C10: A process shall make no change in a valid coded character representation other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points, if that process purports not to modify the interpretation of that coded character sequence. The interlinear annotation characters fall in a gray zone, since they are not noncharacters, but by rights ought to have been. Since they are standard characters though, the standard has to provide some guidelines -- and it is simply safer, if you encounter and delete them, to also delete the annotation. You would be changing the interpretation of the text, but in a knowing, intended manner. > > I would have thought the rule was to ignore and pass along. In general, yes, as for everything else, including unassigned code points. If your role in life is as a database, for example, or some other kind of data source or data pipe, then minimal meddling with the bytes is safest. But other kinds of processes will do graduated manipulations, depending on what they are aiming for. --Ken