The hyphenation exception list in TUGboat has been accumulating more
and more words used in chemistry and similar fields -- pharmacology,
medicine, etc.  Although the TUGboat list is directed toward English,
I believe that a similar problem exists in other languages as well.
(In fact, a discussion on this topic has appeared on the German TeX
list, tex-d-l.)

It's well past time to update the TUGboat list, and I hope to do so
with the next issue, to appear in early April. I'm trying to answer this question before that happens: Is it possible, or reasonable, to treat "chemistry" as a separate language, with its own patterns, or should the "chemical terms" just continue to be provided as a separate exception list? (I believe it's a bad idea to interfile this category with "ordinary" words. Most (La)TeX users won't need them.)

A significant part of the problem is that much of this terminology is
heavily compounded, and proper hyphenation ideally chooses a break
between, rather than within, elements, although the latter choice
should be available when necessary for even spacing in justified text.

German is heavily compounded, and the hyphenation patterns there seem
to cope well with the situation, although I have no idea how that is
accomplished.  The hyphenation dictionary used to develop the patterns
for UK English is (allegedly) based on etymology, rather than on
pronunciation (the basis for US English hyphenation), and shows more
than one level for choice of breakpoints.

As an example, I've chosen a small collection of terms (not yet in the
TUGboat list), checked the recommended hyphenation in the authoritative
US dictionary, and compared that with the result of using \showhyphens
with the US patterns.  The results follow; I've indicated the preferred
breakpoints by "=".

   -- TeX (US) --         -- correct --
al-dos-terone           al=do-ste-rone
cat-e-cholamine         cat-e=chol=amine
dysau-tono-mia          dys=au-to=no-mia
glu-co-neo-ge-n-e-sis   glu-co=neo=gen-e-sis
glycogenol-y-sis        gly-co=gen-ol-y-sis
hy-per-v-olemia         hy-per=vol-emia
hy-per-v-olemic         hy-per=vol-emic
hy-potha-la-mic         hy-po=tha-lam-ic
hy-potha-la-mus         hy-po=thal-a-mus
hy-po-v-olemia          hy-po=vo-le-mia
hy-po-v-olemic          hy-po=vo-le-mic
neu-roen-docrine        neu-ro=en-do-crine
neu-ro-hy-pophysial     neu-ro=hy-po=phy-si-al
neu-ro-hy-poph-ysis     neu-ro=hy-poph-y-sis
neu-ro-pathic           neur-o=pa-thic
neu-ropa-thy(ies)       neu-rop-a-thy(ies)
os-more-cep-tor         os-mo=re-cep-tor
parasym-pa-thetic       para=sym-pa-thet-ic
pheochro-mo-cy-toma     pheo=chro-mo=cy-to-ma

One consideration must be the TeX limit on number of exceptions,
which I believe is 65535.  Another consideration is that, if dedicated
patterns are devised, they should not have an adverse effect on any of
the natural-language patterns.

Suggestions and discussion are solicited.  I am subscribed to this list,
so it's not necessary to address me separately.
                                                --bb

Reply via email to