Dear Unicoders

There are some characters that have no precedent in existing encodings and are 
also hard
to attest directly from printed sources. Can one still make a solid case for 
encoding those in Unicode? 

I am thinking of characters that are either invisible (most of the time) or can 
become invisible under certain circumstances.

Precedence
----------

- HYPHEN U+2010 is *always* rendered as a hyphen (i.e. a centered horizontal 
bar glyph),  
  which may look identical to Hyphen-Minus U+002D.

- SOFT HYPHEN (SHY) U+00AD is *only* rendered as a hyphen *when* it appears at 
the end of a line.

- At least four existing math operators are *never* rendered with a visible 
glyph  
  and only explicitly encode semantics where syntax is potentially ambiguous 
otherwise:

  * FUNCTION APPLICATION U+2061  
    is used where no multiplication is implied,  
    e.g. between an alphabetic function variable and an opening parenthesis: 
f(x).
  * INVISIBLE TIMES U+2062  
    is used where multiplication by either TIMES U+00D7 or MIDDLE DOT U+00B7 is 
implied,  
    e.g. between a number and an alphabetic variable, constant or parenthesis: 
2πr(a+b)
  * INVISIBLE SEPARATOR U+2063  
    is used where enumeration by a COMMA U+002C or SEMICOLON U+003B (and 
possibly whitespace) is implied,  
    e.g. between two single-letter variable indices: aᵢⱼ.
  * INVISIBLE PLUS U+2064  
    is used where addition by PLUS SIGN U+002B is implied,  
    e.g. between an integer and a vulgar fraction: 1⅔.

Suggestions
-----------

- INVERSE SOFT HYPHEN (ISHY) or SOFT INVISIBLE HYPHEN (SIHY)  
  is *always* rendered as a hyphen *unless* it appears at the end of a line. 

- INVISIBLE HYPHEN (IHY) or ZERO-WIDTH HYPHEN (ZWH)  
  is *never* rendered as a hyphen,  
  *but* the word it appears in is treated as if it contained one at its 
position. 

- INVERSE SOFT COMMA (ISC) or SOFT INVISIBLE COMMA (SIC)  
  is *always* rendered as a comma *unless* it appears at the end of a line. 

- INVISIBLE OPEN PARENTHESIS (IOP) and INVISIBLE CLOSE PARENTHESIS (ICP)  
  *should not* be rendered with a visible glyph, but *may* be for inline 
fallback.

ISHY/SIHY is especially useful for encoding (German) noun compounds in wrapped 
titles, e.g. on product labeling, where hyphens are often suppressed for 
stylistic reasons, e.g. orthographically correct _Spargelsuppe_, 
_Spargel-Suppe_ (U+002D) or _Spargel‐Suppe_ (U+2010) may be rendered as 
_Spargel␤Suppe_ and could then be encoded as _Spargel<ISHY>Suppe_.

Like the existing invisible math operators, IHY/ZWH is used where the presence 
of its visible counterpart (i.e. HYPHEN) would be required syntactically (i.e. 
orthographically), but can be derived from context and convention (at least by 
human readers). This is useful for spell-checking, line-breaking etc., e.g. for 
words (commercial names in particular) with internal capital letters that would 
otherwise break orthographic rules and that should be broken at the of end a 
line without a hyphen added (i.e. like ISHY/SIHY, not SHY). This is very 
similar to ZERO-WIDTH SPACE (ZWSP) and WORD JOINER (WJ) indeed, except that 
ZWSP separates two words, where IHY/ZWH joins them into one, but unlike WJ 
still allows a line break.

ISC/SIC is particularly useful in wrapping table headers where a possible line 
break can take on the separating role of a comma.

IOP and ICP enclose mathematical expressions to override precedence of 
operators that would otherwise apply and they enclose textual annotation that 
should be displayed outside the normal row of characters, e.g. a sum in the 
numerator or denominator of a fraction and ruby/furigana pronunciation hints, 
respectively, that both *may* be rendered inline where advanced typographic 
functionality is unavailable and should then be parenthesized for clarity.

Reply via email to