The hyphenation exception list in TUGboat has been accumulating more
and more words used in chemistry and similar fields -- pharmacology,
medicine, etc. Although the TUGboat list is directed toward English,
I believe that a similar problem exists in other languages as well.
(In fact, a discussion on this topic has appeared on the German TeX
list, tex-d-l.)
It's well past time to update the TUGboat list, and I hope to do so
with the next issue, to appear in early April. I'm trying to answer
this question before that happens: Is it possible, or reasonable,
to treat "chemistry" as a separate language, with its own patterns,
or should the "chemical terms" just continue to be provided as a
separate exception list? (I believe it's a bad idea to interfile this
category with "ordinary" words. Most (La)TeX users won't need them.)
A significant part of the problem is that much of this terminology is
heavily compounded, and proper hyphenation ideally chooses a break
between, rather than within, elements, although the latter choice
should be available when necessary for even spacing in justified text.
German is heavily compounded, and the hyphenation patterns there seem
to cope well with the situation, although I have no idea how that is
accomplished. The hyphenation dictionary used to develop the patterns
for UK English is (allegedly) based on etymology, rather than on
pronunciation (the basis for US English hyphenation), and shows more
than one level for choice of breakpoints.
As an example, I've chosen a small collection of terms (not yet in the
TUGboat list), checked the recommended hyphenation in the authoritative
US dictionary, and compared that with the result of using \showhyphens
with the US patterns. The results follow; I've indicated the preferred
breakpoints by "=".
-- TeX (US) -- -- correct --
al-dos-terone al=do-ste-rone
cat-e-cholamine cat-e=chol=amine
dysau-tono-mia dys=au-to=no-mia
glu-co-neo-ge-n-e-sis glu-co=neo=gen-e-sis
glycogenol-y-sis gly-co=gen-ol-y-sis
hy-per-v-olemia hy-per=vol-emia
hy-per-v-olemic hy-per=vol-emic
hy-potha-la-mic hy-po=tha-lam-ic
hy-potha-la-mus hy-po=thal-a-mus
hy-po-v-olemia hy-po=vo-le-mia
hy-po-v-olemic hy-po=vo-le-mic
neu-roen-docrine neu-ro=en-do-crine
neu-ro-hy-pophysial neu-ro=hy-po=phy-si-al
neu-ro-hy-poph-ysis neu-ro=hy-poph-y-sis
neu-ro-pathic neur-o=pa-thic
neu-ropa-thy(ies) neu-rop-a-thy(ies)
os-more-cep-tor os-mo=re-cep-tor
parasym-pa-thetic para=sym-pa-thet-ic
pheochro-mo-cy-toma pheo=chro-mo=cy-to-ma
One consideration must be the TeX limit on number of exceptions,
which I believe is 65535. Another consideration is that, if dedicated
patterns are devised, they should not have an adverse effect on any of
the natural-language patterns.
Suggestions and discussion are solicited. I am subscribed to this list,
so it's not necessary to address me separately.
--bb