Am 08.06.25 um 08:30 schrieb r.ermers--- via tex-hyphen:

Dear sirs,

As a turcologist doing research on Kazakh, I am trying to set up hyphenation 
patterns for Kazakh in order to use them in LaTeX.

As you may know, Kazakh is an agglutinating Turkic language, related to - among 
others - Turkish, Turkmen, Kyrghyz and Tatar. Unlike Turkmen and Turkish, 
Kazakh is (still) written in the cyrillic script.

With the help of generate-patterns-kk.rb I generated a file hyph-kk.tex, with 
which I would like conduct my first experiments.

One question is whether it is handy to divide vowels into back and front, like 
the developer of the Turkmen patterns has done, or not. A second question is 
how to implement exceptions, e.g. loan-words from Russian. Thirdly, I need 
advice in how to implement the patterns in a Texlive installation.

It goes without saying that I will make the patterns and other files for the 
benefit of future LaTeX users.

Can you advise me how to proceed further?

Dear Robert,

There are two ways to generate hyphenation patterns for TeX.

The first way needs a list of hyphenated words which is sufficiently long and representative. The hyphenation patterns are generated from this list by several runs of the patgen command line program, which is part of the TeX distributions. This method has been used by Knuth for his (American) English patterns. It is also used for German (based on a list of more than 500,000 German words). Unfortunately, I cannot give a complete list of languages using the patgen method.

The second way is not based on concrete words, but on abstract rules. These rules can be expressed by suitable hyphenation patterns, that can be either written entirely by hand or with the help of a simple script like your generate-patterns-kk.rb.

The advantage of the first way is that normally no exceptions are needed. All extraordinary words can become part of the input list, which causes no problem as long as enough ordinary words are present there. The disadvantage is that it may be much work to gather enough hyphenated words.

The second way’s advantage is that the pattern generation is quite straightforward, while the disadvantage is that (many) exceptions may be needed for foreign and compound words. Unfortunately, I know next to nothing about Turkic languages, so I can’t judge if compounds are an issue. There are two ways of handling exceptions (that may be combined): It is possible to add patterns for the irregular cases after having written the regular patterns. Look into the French pattern file hyph-fr.tex to get an idea of this method: the indented patterns treat compound words. The other way to treat exceptions is the \hyphenation{} command. This is used for example in the Dutch pattern file hyph-nl.tex.

The question concerning back and front vowels is of linguistic nature and I cannot help with that.

To test the patterns create a document based on the example given here: https://latex3.github.io/babel/guides/locale-kazakh.html

You may want to load the showhyphenation package (LuaLaTeX only) that marks all hyphenation points with a small red triangle.

Then make sure that hyph-kk.tex contains no duplicate patterns. I have found that there are about 20 duplicates and this leads to errors in the following process.

Then move the hyph-kk.tex file as well as a suitable loadhyph file to the TEXMFLOCAL/tex/generic/ directory. TEXMFLOCAL is the locale TeX tree, normally /usr/local/texlive/texmf-local on Unix systems and C:\texlive\texmf-local on Windows. I have appended a simple loadhyph file for Kazakh without support for 8-bit engines.

Then create the file TEXMFLOCAL/tex/generic/config/language.dat if not already present and add the line
  kazakh  loadhyph-kk.tex
to the file (the name of the language followed by a tab character, followed by the name of the loadhyph file).

Then run as administrator the commands
  mktexlsr
and
  tlmgr generate --rebuild-sys language
Now the Kazakh hyphenation should work.

Good luck,
Keno
% filename: loadhyph-ru.tex
% language: kazakh
%
% Loader for hyphenation patterns, generated by
%     source/generic/hyph-utf8/generate-pattern-loaders.rb
% See also http://tug.org/tex-hyphen
%
% Copyright 2008-2025 TeX Users Group.
% You may freely use, modify and/or distribute this file.
% (But consider adapting the scripts if you need modifications.)
%
% Once it turns out that more than a simple definition is needed,
% these lines may be moved to a separate file.
%
\begingroup
% Test for pTeX
\ifx\kanjiskip\undefined
% Test for native UTF-8 (which gets only a single argument)
% That's Tau (as in Taco or ΤΕΧ, Tau-Epsilon-Chi), a 2-byte UTF-8 character
\def\testengine#1#2!{\def\secondarg{#2}}\testengine Τ!\relax
\ifx\secondarg\empty
    % Unicode-aware engine (such as XeTeX or LuaTeX) only sees a single (2-byte) argument
    \message{UTF-8 Kazakh hyphenation patterns}
    \input hyph-kk.tex
\else
    % 8-bit engine (such as TeX or pdfTeX)
    % do nothing for now
\fi\else
    % pTeX
    % do nothing for now
\fi
\endgroup

Reply via email to