Am 08.06.25 um 08:30 schrieb r.ermers--- via tex-hyphen:
Dear sirs,
As a turcologist doing research on Kazakh, I am trying to set up hyphenation
patterns for Kazakh in order to use them in LaTeX.
As you may know, Kazakh is an agglutinating Turkic language, related to - among
others - Turkish, Turkmen, Kyrghyz and Tatar. Unlike Turkmen and Turkish,
Kazakh is (still) written in the cyrillic script.
With the help of generate-patterns-kk.rb I generated a file hyph-kk.tex, with
which I would like conduct my first experiments.
One question is whether it is handy to divide vowels into back and front, like
the developer of the Turkmen patterns has done, or not. A second question is
how to implement exceptions, e.g. loan-words from Russian. Thirdly, I need
advice in how to implement the patterns in a Texlive installation.
It goes without saying that I will make the patterns and other files for the
benefit of future LaTeX users.
Can you advise me how to proceed further?
Dear Robert,
There are two ways to generate hyphenation patterns for TeX.
The first way needs a list of hyphenated words which is sufficiently
long and representative. The hyphenation patterns are generated from
this list by several runs of the patgen command line program, which is
part of the TeX distributions. This method has been used by Knuth for
his (American) English patterns. It is also used for German (based on a
list of more than 500,000 German words). Unfortunately, I cannot give a
complete list of languages using the patgen method.
The second way is not based on concrete words, but on abstract rules.
These rules can be expressed by suitable hyphenation patterns, that can
be either written entirely by hand or with the help of a simple script
like your generate-patterns-kk.rb.
The advantage of the first way is that normally no exceptions are
needed. All extraordinary words can become part of the input list, which
causes no problem as long as enough ordinary words are present there.
The disadvantage is that it may be much work to gather enough hyphenated
words.
The second way’s advantage is that the pattern generation is quite
straightforward, while the disadvantage is that (many) exceptions may be
needed for foreign and compound words. Unfortunately, I know next to
nothing about Turkic languages, so I can’t judge if compounds are an
issue. There are two ways of handling exceptions (that may be combined):
It is possible to add patterns for the irregular cases after having
written the regular patterns. Look into the French pattern file
hyph-fr.tex to get an idea of this method: the indented patterns treat
compound words. The other way to treat exceptions is the \hyphenation{}
command. This is used for example in the Dutch pattern file hyph-nl.tex.
The question concerning back and front vowels is of linguistic nature
and I cannot help with that.
To test the patterns create a document based on the example given here:
https://latex3.github.io/babel/guides/locale-kazakh.html
You may want to load the showhyphenation package (LuaLaTeX only) that
marks all hyphenation points with a small red triangle.
Then make sure that hyph-kk.tex contains no duplicate patterns. I have
found that there are about 20 duplicates and this leads to errors in the
following process.
Then move the hyph-kk.tex file as well as a suitable loadhyph file to
the TEXMFLOCAL/tex/generic/ directory. TEXMFLOCAL is the locale TeX
tree, normally /usr/local/texlive/texmf-local on Unix systems and
C:\texlive\texmf-local on Windows. I have appended a simple loadhyph
file for Kazakh without support for 8-bit engines.
Then create the file TEXMFLOCAL/tex/generic/config/language.dat if not
already present and add the line
kazakh loadhyph-kk.tex
to the file (the name of the language followed by a tab character,
followed by the name of the loadhyph file).
Then run as administrator the commands
mktexlsr
and
tlmgr generate --rebuild-sys language
Now the Kazakh hyphenation should work.
Good luck,
Keno
% filename: loadhyph-ru.tex
% language: kazakh
%
% Loader for hyphenation patterns, generated by
% source/generic/hyph-utf8/generate-pattern-loaders.rb
% See also http://tug.org/tex-hyphen
%
% Copyright 2008-2025 TeX Users Group.
% You may freely use, modify and/or distribute this file.
% (But consider adapting the scripts if you need modifications.)
%
% Once it turns out that more than a simple definition is needed,
% these lines may be moved to a separate file.
%
\begingroup
% Test for pTeX
\ifx\kanjiskip\undefined
% Test for native UTF-8 (which gets only a single argument)
% That's Tau (as in Taco or ΤΕΧ, Tau-Epsilon-Chi), a 2-byte UTF-8 character
\def\testengine#1#2!{\def\secondarg{#2}}\testengine Τ!\relax
\ifx\secondarg\empty
% Unicode-aware engine (such as XeTeX or LuaTeX) only sees a single (2-byte) argument
\message{UTF-8 Kazakh hyphenation patterns}
\input hyph-kk.tex
\else
% 8-bit engine (such as TeX or pdfTeX)
% do nothing for now
\fi\else
% pTeX
% do nothing for now
\fi
\endgroup