Re: [XeTeX] hyphenation in Ethiopian languages
On 11 May 2011, at 23:46, Arthur Reutenauer wrote: That doesn't surprise me; I'd expect you to get the font's .notdef glyph (which might be a blank space, as in this example, or a box, or some other symbol). Thanks for the explanation, that makes sense. What you want is a character that has a zero-width, invisible glyph; if the font supports any of the Unicode characters such as ZWNBSP or ZWNJ or WJ or CGJ, etc., that ought to work. Yes, that's what I thought too, but it doesn't provide a font-independent solution. Or character 13 (CR) is a likely bet, too. Note that Mojca remarked that using character 10 (LF) produced the desired result in that particular font (Abyssinica SIL). Is there any reason why one would prefer the former over the latter, or why either of these characters would be a safer bet in general? I would have thought that both of them, being control characters (sort of), would precisely have no glyph in most fonts; after all, who would want to set a glyph for a character that's supposed to indicate the end of a line of text? Hmm, looking at Microsoft's recommendations[1], it sounds like you should be aiming for glyph 1, and character codes that should map to that glyph include U+ (null), U+0008 (backspace) and U+001D (group separator). They say that U+000D (CR) should have a positive advance width (which is not what you want); although I think I recall seeing somewhat different recommendations in the past, perhaps from Apple. With U+000A (LF), there's a greater risk that it will map to .notdef and show up as a box, I think. This certainly used to be fairly common in TrueType fonts, and showed up as boxes at the start of each line when a DOS-originated text file with CRLF line-ends was loaded into a classic MacOS application that treated CR alone as the line ending, and didn't filter out the LF characters. So to sum up, I think U+ ought to work if fonts carefully follow the MS recommendations; if it doesn't, other control-char codes are worth a try, but there's no guarantee that you'll find a universal, font-independent solution. JK [1] http://www.microsoft.com/typography/otspec/recom.htm -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
Hmm, looking at Microsoft's recommendations[1], it sounds like you should be aiming for glyph 1, and character codes that should map to that glyph include U+ (null), U+0008 (backspace) and U+001D (group separator). Thanks Jonathan, that's most useful. Sadly, all of these characters seem to map to .notdef in Abyssinica, like all of the Unicode characters you mentioned earlier, apart for ZWNJ and ZWJ. (Useless piece of trivia: did you know that as of Unicode 6.1, only four characters have a name starting with ZERO WIDTH? They've all been mentioned in that thread.) Carriage return and line feed both have a zero-width glyph, as has tabulation (U+0009), again against the recommendation that says that its glyph should have the same width as the one for space. That's most disconcerting. With U+000A (LF), there's a greater risk that it will map to .notdef and show up as a box, I think. This certainly used to be fairly common in TrueType fonts, and showed up as boxes at the start of each line when a DOS-originated text file with CRLF line-ends was loaded into a classic MacOS application that treated CR alone as the line ending, and didn't filter out the LF characters. Amusing :-) So to sum up, I think U+ ought to work if fonts carefully follow the MS recommendations; if it doesn't, other control-char codes are worth a try, but there's no guarantee that you'll find a universal, font-independent solution. Indeed not. In fact, what you've just said proves that it's probably hopeless to expect font designers to follow the recommendation in that particular area. Better to poke around by trying out a list of possible characters that could have zero width. Arthur -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
And another nasty issue (that might deserve its own thread). We wanted to have no hyphenchar at all, but using \hyphenchar\font=0 has a nasty consequence that lines with broken words are not properly justified (some extra space is squeezed between the last character in line and the non-existent hyphen char). Actually, what we've observed is that XeTeX seems to produce a glyph in the output even if we set \hyphenchar to a character for which there is no glyph in the current font. In the attached example, the fourth line from the end ends in a word that has been hyphenated, and the trailing white space can actually be copy-pasted from the PDF file and yields Unicode character U+ (oddly enough). Arthur \documentclass[12pt]{article} \usepackage{fontspec} \usepackage{polyglossia} \setmainlanguage{english} \setotherlanguage{amharic} \newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL} \XeTeXlinebreaklocale en \XeTeXlinebreakskip 0pt plus 5em \begin{document} \title{Sample in Gǝ`ǝz} \maketitle \hsize=8cm \begin{amharic} \hyphenchar\font=0 እስመ፡አግዚአብሔር፡አምላክ፡ማእምር፡ውእቱ።እግዚአብሔር፡አስተደወ፡መንብሮ።ወአድከመ፡ቅሥተ፡ኀያላን።ወአቅነቶሙ፡ኀይለ፡ለድኩማን።ጽጉማን፡እክል፡ርኅቡ።ወርኁባን፡ጸግቡ።እስመ፡መካን፡ወለደት፡ሰብዐተ፡ወወለድሰ፡ስእነት፡ወሊደ፡እግዚአብሔር፡ይቀትል፡ወየሐዩ።ያወርድኒ፡ውስተ፡ሲእል፡ወየዐርግ።እግዚአብሔር፡ያነዲ፡ወያብዕል።ያኀስርሂ፡ወያከብር፡ዘያነሥኦ፡እምድር፡ለነዳይ።ከመ፡ያንብሮ፡ምስለ፡ዓበይ[ተ]~። \end{amharic} \end{document} ethiop-linebreaklocale.pdf Description: Adobe PDF document -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
On 11 May 2011, at 18:02, Arthur Reutenauer wrote: And another nasty issue (that might deserve its own thread). We wanted to have no hyphenchar at all, but using \hyphenchar\font=0 has a nasty consequence that lines with broken words are not properly justified (some extra space is squeezed between the last character in line and the non-existent hyphen char). Actually, what we've observed is that XeTeX seems to produce a glyph in the output even if we set \hyphenchar to a character for which there is no glyph in the current font. That doesn't surprise me; I'd expect you to get the font's .notdef glyph (which might be a blank space, as in this example, or a box, or some other symbol). What you want is a character that has a zero-width, invisible glyph; if the font supports any of the Unicode characters such as ZWNBSP or ZWNJ or WJ or CGJ, etc., that ought to work. Or character 13 (CR) is a likely bet, too. JK -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
That doesn't surprise me; I'd expect you to get the font's .notdef glyph (which might be a blank space, as in this example, or a box, or some other symbol). Thanks for the explanation, that makes sense. What you want is a character that has a zero-width, invisible glyph; if the font supports any of the Unicode characters such as ZWNBSP or ZWNJ or WJ or CGJ, etc., that ought to work. Yes, that's what I thought too, but it doesn't provide a font-independent solution. Or character 13 (CR) is a likely bet, too. Note that Mojca remarked that using character 10 (LF) produced the desired result in that particular font (Abyssinica SIL). Is there any reason why one would prefer the former over the latter, or why either of these characters would be a safer bet in general? I would have thought that both of them, being control characters (sort of), would precisely have no glyph in most fonts; after all, who would want to set a glyph for a character that's supposed to indicate the end of a line of text? Arthur -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
On Fri, May 6, 2011 at 19:24, Jonathan Kew wrote: For line-breaking after the word separators, doesn't it work to just set \XeTeXlinebreaklocale en \XeTeXlinebreakskip 0pt plus 1pt or similar? Yes, thanks a lot. This does work. However there are two problems with it: - Only ETHIOPIC WORDSPACE has BA (Break After) mark while ETHIOPIC FULL STOP has AL (Alphabetic) mark, so text won't break after the full stop. This is probably a bug in Unicode standard. - We cannot control the space before ethiopic wordspace with that, just the space after it. Without some stretching glue it is impossible to align/justify text. And another nasty issue (that might deserve its own thread). We wanted to have no hyphenchar at all, but using \hyphenchar\font=0 has a nasty consequence that lines with broken words are not properly justified (some extra space is squeezed between the last character in line and the non-existent hyphen char). It took me a while before realizing that \hyphenchar\font=10 solves the issue somehow, but I still find that totally weird and I'm not sure if using number 10 only solved the issue for that particular font or if that is a stable behaviour for other fonts as well. I wanted to compare the bahaviour with pdfTeX, but I realized that pdfTeX doesn't offer any option to really remove the hyphen char; one can only disable hyphenation with -1 or use a number between 0 and 255 (which usually points to an existing glyph). Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
On Thu, Nov 4, 2010 at 14:42, Adam McCollum wrote: Dear list members, I've recently drawn up a short document in Ge`ez (classical Ethiopic) using Polyglossia and I see that the hyphenation is wrong. As some of you know, languages that use the Ethiopic script, including Ge`ez and Amharic, place a word divider—it looks somewhat like a thick colon—between each word and two of these dividers side by side between sentences; see some Amharic examples here. That being the case, a word may be broken at any syllable (the script is a syllabary, not an alphabet) at the end of a line, but there is nothing corresponding to a hyphen. An additional matter of importance is that no line should begin with the single or double word divider. How should this be fixed? Dear Adam, We have submitted Ethiopic hyphenation patterns to CTAN (and TL) a while ago, so once you update you TeX Live, it should work out of the box. However there is a nasty limitation in XeTeX: words hyphenate only up to 64 characters, so unless somebody fixes XeTeX, you need other tricks and workarounds. The code below inserts a breakable space before every word separator (and thus allows XeTeX to start breaking the next word from scratch). In addition to that you also need to make sure that: - there is no hyphenation character at the end of line - lines are properly aligned - you might want (or not) some extra space around word and sentence delimiters Together with Arthur we created the following working example, but it would be great if François would include some of that code into Polyglossia. If you want to have space around word delimiters, you need to create some non-breakable space in front of delimiter and some breakable space after the delimiter. The amount of space might need to be configurable. My estimates might not be the best ones (0.4 +/- 0.1 em), so feel free to fix to the most suitable values. Apart from that you might want to have both spaces of equal size (I wasn't sure how to achieve that). \documentclass[12pt]{article} \usepackage{fontspec} \usepackage{polyglossia} \setmainlanguage{english} \setotherlanguage{amharic} \newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL} \newXeTeXintercharclass \ethiletter \newXeTeXintercharclass \ethispace \newcount\tmp \def\setclass[#1-#2]#3{% \tmp=#1 \XeTeXcharclass\tmp=#3 \loop\ifnum\tmp#2 \advance\tmp by 1 \XeTeXcharclass\tmp=#3 \repeat} \setclass[1200-139F]\ethiletter \XeTeXinterchartokenstate=1 \XeTeXcharclass1361\ethispace \XeTeXcharclass1362\ethispace \XeTeXinterchartoks \ethispace \ethiletter = {\egroup\hskip.4em plus .1em minus .1em} \XeTeXinterchartoks \ethiletter \ethispace = {\kern.4em\bgroup} \begin{document} \title{Sample in Gǝ`ǝz} \maketitle % \hsize=8cm \begin{amharic} \hyphenchar\font=0 እስመ፡አግዚአብሔር፡አምላክ፡ማእምር፡ውእቱ።እግዚአብሔር፡አስተደወ፡መንብሮ።ወአድከመ፡ቅሥተ፡ኀያላን።ወአቅነቶሙ፡ኀይለ፡ለድኩማን።ጽጉማን፡እክል፡ርኅቡ።ወርኁባን፡ጸግቡ።እስመ፡መካን፡ወለደት፡ሰብዐተ፡ወወለድሰ፡ስእነት፡ወሊደ፡እግዚአብሔር፡ይቀትል፡ወየሐዩ።ያወርድኒ፡ውስተ፡ሲእል፡ወየዐርግ።እግዚአብሔር፡ያነዲ፡ወያብዕል።ያኀስርሂ፡ወያከብር፡ዘያነሥኦ፡እምድር፡ለነዳይ።ከመ፡ያንብሮ፡ምስለ፡ዓበይ[ተ]~። \end{amharic} \end{document} Please let us know if that works the way you want it to work. If you need a LuaTeX solution, please let us know as well. Mojca PS: You could also simply use \XeTeXinterchartoks \ethiletter \ethiletter = {\hskip0pt} and thus avoid the need for any hyphenation patterns at all. -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
On 6 May 2011, at 18:03, Mojca Miklavec wrote: On Thu, Nov 4, 2010 at 14:42, Adam McCollum wrote: Dear list members, I've recently drawn up a short document in Ge`ez (classical Ethiopic) using Polyglossia and I see that the hyphenation is wrong. As some of you know, languages that use the Ethiopic script, including Ge`ez and Amharic, place a word divider—it looks somewhat like a thick colon—between each word and two of these dividers side by side between sentences; see some Amharic examples here. That being the case, a word may be broken at any syllable (the script is a syllabary, not an alphabet) at the end of a line, but there is nothing corresponding to a hyphen. An additional matter of importance is that no line should begin with the single or double word divider. How should this be fixed? Dear Adam, We have submitted Ethiopic hyphenation patterns to CTAN (and TL) a while ago, so once you update you TeX Live, it should work out of the box. .. For line-breaking after the word separators, doesn't it work to just set \XeTeXlinebreaklocale en \XeTeXlinebreakskip 0pt plus 1pt or similar? JK -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
Dear Jonathan, On Fri, May 6, 2011 at 19:24, Jonathan Kew wrote: On 6 May 2011, at 18:03, Mojca Miklavec wrote: On Thu, Nov 4, 2010 at 14:42, Adam McCollum wrote: Dear list members, I've recently drawn up a short document in Ge`ez (classical Ethiopic) using Polyglossia and I see that the hyphenation is wrong. As some of you know, languages that use the Ethiopic script, including Ge`ez and Amharic, place a word divider—it looks somewhat like a thick colon—between each word and two of these dividers side by side between sentences; see some Amharic examples here. That being the case, a word may be broken at any syllable (the script is a syllabary, not an alphabet) at the end of a line, but there is nothing corresponding to a hyphen. An additional matter of importance is that no line should begin with the single or double word divider. How should this be fixed? Dear Adam, We have submitted Ethiopic hyphenation patterns to CTAN (and TL) a while ago, so once you update you TeX Live, it should work out of the box. .. For line-breaking after the word separators, doesn't it work to just set \XeTeXlinebreaklocale en \XeTeXlinebreakskip 0pt plus 1pt Hm. Quite possible. None of us (or at least not me) knew about linebreaklocale and linebreakskip, or at least didn't quite think of them. We'll test, thanks a lot for the hint. What exactly does \XeTeXlinebreaklocale en do? (After all, we need breaking of Ethiopic text, not English one.) And where is 0pt plus 1pt applied? Between all characters or just at the end? How is end of line determined? Interesting enough one of the first hits brings me back to Word wrapping in Lao: http://tug.org/pipermail/xetex/2010-April/016331.html which is also being heavily discussed off-list recently. We are experiencing exactly the same problem there: too long lines to allow the hyphenation algorithm to work properly. We are aware of ICU, but nobody knows how to write ICU code even if the algorithm is somewhat straightforward. I hope to have Lao hyphenation patterns ready soon and then we will try to apply some XeTeXinterchartoks-based breaks between letters that always start or end a syllable, only hoping that there will be enough of such letters to cut the remaining text into shorter-than-64-character sequences. Is there really no way to increase the limit for hyphenation in XeTeX from 64 characters to something safer? LuaTeX sets the limit at 256. Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
Hello. Le 4 nov. 2010 à 21:04, Mojca Miklavec a écrit : The problem of colons that may not start a new line has to be solved on a different level. You could write like that: እስመ~፡ አግዚአብሔር~፡ አምላክ~፡ ማእምር~፡ ውእቱ~። እግዚአብሔር~፡ አስተደወ~፡ መንብሮ~። ወአድከመ~፡ ቅሥተ~፡ ኀያላን~። ወአቅነቶሙ~፡ ኀይለ~፡ ለድኩማን~። ጽጉማን~፡ እክል~፡ ርኅቡ~። ወርኁባን~፡ ጸግቡ~። እስመ~፡ መካን~፡ ወለደት~፡ ሰብዐተ~፡ ወወለድሰ~፡ ስእነት~፡ ወሊደ~፡ እግዚአብሔር~፡ ይቀትል~፡ ወየሐዩ~። ያወርድኒ~፡ ውስተ~፡ ሲእል~፡ ወየዐርግ~። እግዚአብሔር~፡ ያነዲ~፡ ወያብዕል~። ያኀስርሂ~፡ ወያከብር~፡ ዘያነሥኦ~፡ እምድር~፡ ለነዳይ~። ከመ~፡ ያንብሮ~፡ ምስለ~፡ ዓበይ[ተ]~። This works perfectly fine, but you probably don't want to write like that. I leave it up to others to solve that problem. The hyphenchar can easily be changed to nothing though. Can't this be treated as double punctuations in French? To get እስመ ፡ you type እስመ፡ and polyglossia adds a kerning before the word divider, except when the preceding character is itself a word divider. Regards, Yves -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
Dear Adam, Line 7 of gloss-amharic.ldf in the polyglossia package has hyphennames={amharic,nohyphenation}, which I take to mean that you'll get no hyphenation wherever 'amharic' is active. The next line is commented out %hyphenmins={2,2}, so I presume that some rules were intended (François?). If the rules are that hyphenation can occur anywhere, I'm sure this would be fairly easily to implement. Gareth. Adam McCollum wrote: Dear list members, I've recently drawn up a short document in Ge`ez (classical Ethiopic) using Polyglossia and I see that the hyphenation is wrong. As some of you know, languages that use the Ethiopic script, including Ge`ez and Amharic, place a word divider—it looks somewhat like a thick colon—between each word and two of these dividers side by side between sentences; see some Amharic examples herehttp://books.google.com/books?id=r87yh5z66TECprintsec=frontcoverdq=amharichl=enei=U7TSTIX-Ds2r8AaT6LxFsa=Xoi=book_resultct=book-thumbnailresnum=6ved=0CEwQ6wEwBQ#v=onepageqf=false. That being the case, a word may be broken at any syllable (the script is a syllabary, not an alphabet) at the end of a line, but there is nothing corresponding to a hyphen. An additional matter of importance is that no line should begin with the single or double word divider. How should this be fixed? Here is a minimal example: \documentclass[12pt]{article} \usepackage{fontspec} \usepackage{polyglossia} \setmainlanguage{english} \setotherlanguage{amharic} \newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL} \begin{document} \title{Sample in Gǝ`ǝz} \maketitle \begin{amharic} እስመ ፡ አግዚአብሔር ፡ አምላክ ፡ ማእምር ፡ ውእቱ ። እግዚአብሔር ፡ አስተደወ ፡ መንብሮ ። ወአድከመ ፡ ቅሥተ ፡ ኀያላን ። ወአቅነቶሙ ፡ ኀይለ ፡ ለድኩማን ። ጽጉማን ፡ እክል ፡ ርኅቡ ። ወርኁባን ፡ ጸግቡ ። እስመ ፡ መካን ፡ ወለደት ፡ ሰብዐተ ፡ ወወለድሰ ፡ ስእነት ፡ ወሊደ ፡ እግዚአብሔር ፡ ይቀትል ፡ ወየሐዩ ። ያወርድኒ ፡ ውስተ ፡ ሲእል ፡ ወየዐርግ ። እግዚአብሔር ፡ ያነዲ ፡ ወያብዕል ። ያኀስርሂ ፡ ወያከብር ፡ ዘያነሥኦ ፡ እምድር ፡ ለነዳይ ። ከመ ፡ ያንብሮ ፡ ምስለ ፡ ዓበይ[ተ] ። \end{amharic} \end{document} With many thanks in advance for the help, Adam McCollum, Ph.D. Lead Cataloger, Eastern Christian Manuscripts Hill Museum Manuscript Library Saint John's University P.O. Box 7300 Collegeville, MN 56321 (320) 363-2075 (phone) (320) 363-3222 (fax) www.hmml.org -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Gareth Hughes Doctoral candidate in Syriac studies Department of Eastern Christianity Oriental Institute Pusey Lane Oxford OX1 2LE -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
On Thu, Nov 4, 2010 at 15:53, Gareth Hughes wrote: If the rules are that hyphenation can occur anywhere, I'm sure this would be fairly easily to implement. I agree. We could add new hyphenation patterns simply listing \patterns{ for $x in list_of_syllables $x2 } From http://www.ancientscripts.com/ethiopic.html I read: Each sign is a syllable (consonant plus vowel), except any sign on the sixth column (ə) represents either the consonant plus the middle central vowel /ə/ or no vowel at all (in which case it is used as a pure consonant in a consonant cluster). This makes it only slightly more complicated, but not that much. But in any case we need someone who would be willing to test (extensively). Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] hyphenation in Ethiopian languages
(I'm adding the TeX hyphenation mailing list to recipients; I apologise for cross-posting. Hyphenation-patterns-related discussion may continue on hyphenation list (or off-list if needed). XeLaTeX issues, in particular how not to start the line with word-or-sentence-separator may stay on the XeTeX list since that's more or less engine- and polyglossia-related.) On Thu, Nov 4, 2010 at 15:53, Gareth Hughes wrote: Dear Adam, Line 7 of gloss-amharic.ldf in the polyglossia package has hyphennames={amharic,nohyphenation}, which I take to mean that you'll get no hyphenation wherever 'amharic' is active. The next line is commented out %hyphenmins={2,2}, so I presume that some rules were intended (François?). If the rules are that hyphenation can occur anywhere, I'm sure this would be fairly easily to implement. An example of hyphenation patterns is attached. I do not claim that the patterns work perfectly (they probably don't, but it might be a starting point). I simply added a number 1 after each valid Unicode character between U+1200 and U+135A (without removing non-existing characters in Amharic and without using those from Unicode 6, 2D80–2DDF). 1.) You need to put the file hyph-am.tex into /usr/local/texlive/2010/texmf-dist/tex/generic/hyph-utf8/patterns/tex/hyph-am.tex 2.) Put loadhyph-am.tex into /usr/local/texlive/2010/texmf-dist/tex/generic/hyph-utf8/loadhyph/loadhyph-am.tex 3.) Add amharic loadhyph-am.tex to /usr/local/texlive/2010/texmf-var/tex/generic/config/language.dat 4.) Change %hyphenmins={2,2}, into hyphenmins={1,1}, in /usr/local/texlive/2010/texmf-dist/tex/xelatex/polyglossia/gloss-amharic.ldf 5.) Run sudo fmtutil-sys --byfmt xelatex You can also test with the following (keep the rest of document unchanged): \newdimen\savehsize \savehsize\hsize \def\test#1{\endgraf\hsize=1pt\noindent #1\endgraf\hsize=\savehsize} \begin{amharic} \test{እስመ ፡ አግዚአብሔር ፡ አምላክ ፡ ማእምር ፡ ውእቱ ። እግዚአብሔር ፡ አስተደወ ፡ መንብሮ ። ወአድከመ ፡ ቅሥተ ፡ ኀያላን ። ወአቅነቶሙ ፡ ኀይለ ፡ ለድኩማን ። ጽጉማን ፡ እክል ፡ ርኅቡ ። ወርኁባን ፡ ጸግቡ ። እስመ ፡ መካን ፡ ወለደት ፡ ሰብዐተ ፡ ወወለድሰ ፡ ስእነት ፡ ወሊደ ፡ እግዚአብሔር ፡ ይቀትል ፡ ወየሐዩ ። ያወርድኒ ፡ ውስተ ፡ ሲእል ፡ ወየዐርግ ። እግዚአብሔር ፡ ያነዲ ፡ ወያብዕል ። ያኀስርሂ ፡ ወያከብር ፡ ዘያነሥኦ ፡ እምድር ፡ ለነዳይ ። ከመ ፡ ያንብሮ ፡ ምስለ ፡ ዓበይ[ተ] ።} \end{amharic} The problem of colons that may not start a new line has to be solved on a different level. You could write like that: እስመ~፡ አግዚአብሔር~፡ አምላክ~፡ ማእምር~፡ ውእቱ~። እግዚአብሔር~፡ አስተደወ~፡ መንብሮ~። ወአድከመ~፡ ቅሥተ~፡ ኀያላን~። ወአቅነቶሙ~፡ ኀይለ~፡ ለድኩማን~። ጽጉማን~፡ እክል~፡ ርኅቡ~። ወርኁባን~፡ ጸግቡ~። እስመ~፡ መካን~፡ ወለደት~፡ ሰብዐተ~፡ ወወለድሰ~፡ ስእነት~፡ ወሊደ~፡ እግዚአብሔር~፡ ይቀትል~፡ ወየሐዩ~። ያወርድኒ~፡ ውስተ~፡ ሲእል~፡ ወየዐርግ~። እግዚአብሔር~፡ ያነዲ~፡ ወያብዕል~። ያኀስርሂ~፡ ወያከብር~፡ ዘያነሥኦ~፡ እምድር~፡ ለነዳይ~። ከመ~፡ ያንብሮ~፡ ምስለ~፡ ዓበይ[ተ]~። This works perfectly fine, but you probably don't want to write like that. I leave it up to others to solve that problem. The hyphenchar can easily be changed to nothing though. Mojca Adam McCollum wrote: Dear list members, I've recently drawn up a short document in Ge`ez (classical Ethiopic) using Polyglossia and I see that the hyphenation is wrong. As some of you know, languages that use the Ethiopic script, including Ge`ez and Amharic, place a word divider—it looks somewhat like a thick colon—between each word and two of these dividers side by side between sentences; see some Amharic examples herehttp://books.google.com/books?id=r87yh5z66TECprintsec=frontcoverdq=amharichl=enei=U7TSTIX-Ds2r8AaT6LxFsa=Xoi=book_resultct=book-thumbnailresnum=6ved=0CEwQ6wEwBQ#v=onepageqf=false. That being the case, a word may be broken at any syllable (the script is a syllabary, not an alphabet) at the end of a line, but there is nothing corresponding to a hyphen. An additional matter of importance is that no line should begin with the single or double word divider. How should this be fixed? Here is a minimal example: \documentclass[12pt]{article} \usepackage{fontspec} \usepackage{polyglossia} \setmainlanguage{english} \setotherlanguage{amharic} \newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL} \begin{document} \title{Sample in Gǝ`ǝz} \maketitle \begin{amharic} እስመ ፡ አግዚአብሔር ፡ አምላክ ፡ ማእምር ፡ ውእቱ ። እግዚአብሔር ፡ አስተደወ ፡ መንብሮ ። ወአድከመ ፡ ቅሥተ ፡ ኀያላን ። ወአቅነቶሙ ፡ ኀይለ ፡ ለድኩማን ። ጽጉማን ፡ እክል ፡ ርኅቡ ። ወርኁባን ፡ ጸግቡ ። እስመ ፡ መካን ፡ ወለደት ፡ ሰብዐተ ፡ ወወለድሰ ፡ ስእነት ፡ ወሊደ ፡ እግዚአብሔር ፡ ይቀትል ፡ ወየሐዩ ። ያወርድኒ ፡ ውስተ ፡ ሲእል ፡ ወየዐርግ ። እግዚአብሔር ፡ ያነዲ ፡ ወያብዕል ። ያኀስርሂ ፡ ወያከብር ፡ ዘያነሥኦ ፡ እምድር ፡ ለነዳይ ። ከመ ፡ ያንብሮ ፡ ምስለ ፡ ዓበይ[ተ] ። \end{amharic} \end{document} With many thanks in advance for the help, Adam McCollum, Ph.D. Lead Cataloger, Eastern Christian Manuscripts Hill Museum Manuscript Library Saint John's University P.O. Box 7300 Collegeville, MN 56321 (320) 363-2075 (phone) (320) 363-3222 (fax) www.hmml.org -- Gareth Hughes Doctoral candidate in Syriac studies Department of Eastern Christianity Oriental Institute Pusey Lane Oxford OX1 2LE hyph-am.tex Description: TeX document loadhyph-am.tex Description: TeX