> >00B7;MIDDLE DOT;Po;0;ON;;;;;N;;;;;
> >10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;;;;;N;;;;;
> >16EB;RUNIC SINGLE PUNCTUATION;Po;0;L;;;;;N;;;;;

> I was meaning to ask about this.  I'm all over not encoding Yet Another 
> middle dot, but I was wondering.  In my research on Samaritan, I've 
> found that they frequently write (you guessed it) a middle dot to 
> separate words (they like to use space to enable them to do this cool 
> columnar writing thing).  I was assuming that this could be conflated 
> with someone else's middle-dot-word-separator; would that be U+10101?

As far as I am concerned, U+00B7 should be sufficient for that.

But if you were looking for a punctuation mark distinguished from
U+00B7, specifically for archaic textual practice, my choice
would be U+16EB (and the Runic double dot, U+16EC) as an
alternative. Scripts.txt treats these as common punctuation:

16EB..16ED    ; Common # Po   [3] RUNIC SINGLE PUNCTUATION..RUNIC CROSS PUNCTUATION

Unfortunately, software may be making over-aggressive assumptions
about script identity in some cases, which can throw off
implementations that pick up punctuation out of another script
block.

Note that as part of the ongoing work to cover Greek paleographic
needs, a large number of multiple dot punctuation characters are
currently under ballot for addition to 10646 (and Unicode). See
2056, 2058..205E at:

http://www.unicode.org/alloc/Pipeline.html

These are (proposed to be) encoded in the General Punctuation block to 
ensure that *everyone* is clear that their intended use is general, so we
don't have to keep cloning more and more such dot combinations
to handle the dot punctuation for each different paleographic
tradition.

--Ken


Reply via email to