On 28/11/2004 00:21, Mark E. Shoulson wrote:

...
Well, that's the difference under discussion. The "plain text" would seem to be either the qere or the ketiv (but not the combined "blended" form), since each of those is somewhat sensible. Peter Kirk's point is that the blended form is what is in fact written and has been so for centuries, so he claims that *it* should be considered the plain text.


But who says the plain text has to be sensible? Unicode is not concerned with representing the text as written, not with its meaning. The following string is meaningless, is not sensible at all, but it is still plain text: gxyfcwx bfzkgf ikxz bgcuyxukb kbcghjkshxcbnhjkc b bhb jksdfncfuhikc. (It's not a code, by the way, it comes from random typing.)

Asmus basically agreed with me, but added:

In scripts with complex layout, of course, not all random character soup would be rendered the same by all systems. Which, I think is the point here. If this is a rather commonly used device, then in principle it's possible to ask why can this not be part of plain text.

If the necessary mechanisms to do this are cheap and simple, the answer is often to bring such things under the plain text umbrella. If it's complicated, the answer should be to leave it to mechanisms such as markup that deal well in (whatever required kind of) complexity.


If there was in fact a need for complex mechanisms to support Ketiv/Qere blended forms in plain text, then I might agree that alternative markup mechanisms need to be looked at. But in fact in this case, as I see it, only two special mechanisms are required:

1) Allowing multiple vowel points with a single base character. The issues concerning this one were discussed at some length on this list last year, concerning the form Yerushala(y)im which is the commonest such form. The solution which was agreed for this form works well with the other rare forms in this category.

2) Allowing floating vowel points (and sometimes accents) with a blank base character. This usually, but not always, happens at the beginning of a word. The mechanism for doing this seems to have been clarified by the UTC: use NBSP as the base character.

So can't we leave it that these mechanisms can be used for representation of these forms by those who wish to represent them in plain text, whereas those who want to use other mechanisms are free to do so?

In answer to the possible objection that this leaves alternative ways to represent the same text, I note that the same alternatives already apply with e.g. superscript digits which may be represented either in plain text with the Unicode superscript digit characters, or as marked up text using superscript markup.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/





Reply via email to