At 03:41 PM 12/29/01 -0500, David J. Perry wrote: >The ancient Roman monetary unit sestertius is not yet in Unicode. It might >well be accepted if proposed, but would be given one codepoint. However, >this unit appears in a variety of ways in inscriptions: IIS, HS, II with a >horizontal line through, S or SS with horizontal line, etc. Epigraphers >frequently like to preserve information such as the exact glyph used in an >inscription. One could create an OpenType font with one sestertius >character and alternative glyphs that could be used for printing or web >pages. But would there be any way to preserve such information in, let's >say, a database of inscriptions if only one codepoint was available? The >Runic block that was added to Unicode 3.0 also comes to mind here. TUS 3 >states that the glyphs used in a given context may vary from those presented >in the charts; so what were the intentions of those who proposed this block? >This seems to be the same issue as the one I raised regarding the >sestertius.
This is something that you cannot do in plain text. It's a fundamental limitation. Same as you cannot maintain a database of instances where the dollar sign or Yen/Yuan sign appears with single or double strokes, by just using U+0024 or U+00A5. To a limited extent it makes sense to encode more than one historic form of a character. Usually it may be considered in cases where such historic forms can be considered a historic script (or historic alphabet) in their own right, and when separating the historic periods solves more problems than it creates. (So it's a judgement call above all). Sometimes if the forms are very unrelated by appearance and especially if there is a possibility that at least one of them might be used for an unrelated meaning, it might make sense to encode both. Finally, where currencies are written with simple digraph letters, there is no need to encode a single character. If your examples IIS and HS don't have lines through them, there wouldn't be a need to encode them. The strings IIS or HS should serve. So, given my understanding of your example, I could see at most two possible forms, II with a line throug and SS with a line through. If both of these are substitutable (except for capturing the 'exact' appearane of an inscription) then they should get a single shared character code. In a few cases, where clear glyph alternatives exist and where there is a strongrequirement to preserve them in plain text, the use of a Variation Selector character can be defined, allowing one to express the distinction. This is a very useful facility to make unnecessary the encoding of many borderline characters, but should not be abused as a general glyph description mechanism. One would need to show in each case that the distinction must be preserved (at least sometimes) in plain text, even though there is no ordinary distinction in meaning. A./