2013/2/7 Richard Wordingham <richard.wording...@ntlworld.com>: > You said, on 5 February, > > "A process can be FULLY conforming by preserving the canonical > equivalence and treating ALL strings that are canonically equivalent, > without having to normalize them in any recommanded form, or > performing any reordering in its backing store, or it can choose to > normalize to any other form that is convenient for that process (so it > could be NFC or NFD, or something else)" > > There's no qualification there disqualifying collation at the secondary > level from being a 'process' which may or may not be conforming.
Citing this email, the restriction to primary level was included before this sentence, and implied. You just did not quote it along with this. Be careful about taking senetencves out of their contexts, when the whole thread started by spekaing about primary level only for basic searches. OK there are some pathological cases but they are really constructed and not made for modern languages (except a fex Indic ones as you noted), but none of them that concern the Latin script (your <TILDE+V> example collating like <N> is not an effective true example, it is fully constructed and not found in the CLDR). If you just consider the initial question, having to decompose letters to "recompose" them in defective ways just to create rare single collation elements remains a very borderline case for applications like browsers that just perform plain-text search at primary level on a web page. Even if the implementation really uses a full decomposition, I doubt it even has any implemented tailoring that would recognize those defective collation elements When it is used for example in old Medieval texts where tildes are used as abbreviation marks with some unclear meaning anyway and that would be more safely interpreted like the abbreviation dot we more commonly see today ; there are other notations for abbreviations that even a full UCA implmeentation will not recognize, notably the use of superscripts or subscripts when they are not using superscript or subscript characters, but any standard baseline characters with styling elements like HTML sub/sup elements or spans with CSS styles, and no other encoded invisible control to denote the meaning of this superscript as an abbreviation. Sim!ilar issues occur when there are some other styles like strokes/underlines/overlines (i.e. text-decoration in CSS), and that a plain-text only search will not recognize (and certainly not if it's working only at collation level 1).