Re: [XeTeX] hyphenation in Ethiopian languages

2011-05-12 Thread Jonathan Kew
On 11 May 2011, at 23:46, Arthur Reutenauer wrote:

 That doesn't surprise me; I'd expect you to get the font's .notdef glyph 
 (which might be a blank space, as in this example, or a box, or some other 
 symbol).
 
 Thanks for the explanation, that makes sense.
 
 What you want is a character that has a zero-width, invisible glyph; if the 
 font supports any of the Unicode characters such as ZWNBSP or ZWNJ or WJ or 
 CGJ, etc., that ought to work.
 
  Yes, that's what I thought too, but it doesn't provide a font-independent 
 solution.
 
 Or character 13 (CR) is a likely bet, too.
 
  Note that Mojca remarked that using character 10 (LF) produced the desired 
 result in that particular font (Abyssinica SIL).  Is there any reason why one 
 would prefer the former over the latter, or why either of these characters 
 would be a safer bet in general?  I would have thought that both of them, 
 being control characters (sort of), would precisely have no glyph in most 
 fonts; after all, who would want to set a glyph for a character that's 
 supposed to indicate the end of a line of text?

Hmm, looking at Microsoft's recommendations[1], it sounds like you should be 
aiming for glyph 1, and character codes that should map to that glyph include 
U+ (null), U+0008 (backspace) and U+001D (group separator). They say that 
U+000D (CR) should have a positive advance width (which is not what you want); 
although I think I recall seeing somewhat different recommendations in the 
past, perhaps from Apple.

With U+000A (LF), there's a greater risk that it will map to .notdef and show 
up as a box, I think. This certainly used to be fairly common in TrueType 
fonts, and showed up as boxes at the start of each line when a DOS-originated 
text file with CRLF line-ends was loaded into a classic MacOS application 
that treated CR alone as the line ending, and didn't filter out the LF 
characters.

So to sum up, I think U+ ought to work if fonts carefully follow the MS 
recommendations; if it doesn't, other control-char codes are worth a try, but 
there's no guarantee that you'll find a universal, font-independent solution.

JK

[1] http://www.microsoft.com/typography/otspec/recom.htm




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2011-05-12 Thread Arthur Reutenauer
 Hmm, looking at Microsoft's recommendations[1], it sounds like you should be 
 aiming for glyph 1, and character codes that should map to that glyph include 
 U+ (null), U+0008 (backspace) and U+001D (group separator).

  Thanks Jonathan, that's most useful.  Sadly, all of these characters
seem to map to .notdef in Abyssinica, like all of the Unicode characters
you mentioned earlier, apart for ZWNJ and ZWJ.  (Useless piece of
trivia: did you know that as of Unicode 6.1, only four characters have a
name starting with ZERO WIDTH?   They've all been mentioned in that
thread.)  Carriage return and line feed both have a zero-width glyph, as
has tabulation (U+0009), again against the recommendation that says that
its glyph should have the same width as the one for space.  That's most
disconcerting.

 With U+000A (LF), there's a greater risk that it will map to .notdef and show 
 up as a box, I think. This certainly used to be fairly common in TrueType 
 fonts, and showed up as boxes at the start of each line when a DOS-originated 
 text file with CRLF line-ends was loaded into a classic MacOS application 
 that treated CR alone as the line ending, and didn't filter out the LF 
 characters.

  Amusing :-)

 So to sum up, I think U+ ought to work if fonts carefully follow the MS 
 recommendations; if it doesn't, other control-char codes are worth a try, but 
 there's no guarantee that you'll find a universal, font-independent solution.

  Indeed not.  In fact, what you've just said proves that it's probably
hopeless to expect font designers to follow the recommendation in that
particular area.  Better to poke around by trying out a list of possible
characters that could have zero width.

Arthur


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2011-05-11 Thread Arthur Reutenauer
 And another nasty issue (that might deserve its own thread). We wanted
 to have no hyphenchar at all, but using \hyphenchar\font=0 has a nasty
 consequence that lines with broken words are not properly justified
 (some extra space is squeezed between the last character in line and
 the non-existent hyphen char).

  Actually, what we've observed is that XeTeX seems to produce a glyph
in the output even if we set \hyphenchar to a character for which there
is no glyph in the current font.  In the attached example, the fourth
line from the end ends in a word that has been hyphenated, and the
trailing white space can actually be copy-pasted from the PDF file and
yields Unicode character U+ (oddly enough).

Arthur
\documentclass[12pt]{article}
\usepackage{fontspec}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage{amharic}
\newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL}

\XeTeXlinebreaklocale en
\XeTeXlinebreakskip 0pt plus 5em

\begin{document}
\title{Sample in Gǝ`ǝz}
\maketitle

\hsize=8cm

\begin{amharic}

\hyphenchar\font=0
እስመ፡አግዚአብሔር፡አምላክ፡ማእምር፡ውእቱ።እግዚአብሔር፡አስተደወ፡መንብሮ።ወአድከመ፡ቅሥተ፡ኀያላን።ወአቅነቶሙ፡ኀይለ፡ለድኩማን።ጽጉማን፡እክል፡ርኅቡ።ወርኁባን፡ጸግቡ።እስመ፡መካን፡ወለደት፡ሰብዐተ፡ወወለድሰ፡ስእነት፡ወሊደ፡እግዚአብሔር፡ይቀትል፡ወየሐዩ።ያወርድኒ፡ውስተ፡ሲእል፡ወየዐርግ።እግዚአብሔር፡ያነዲ፡ወያብዕል።ያኀስርሂ፡ወያከብር፡ዘያነሥኦ፡እምድር፡ለነዳይ።ከመ፡ያንብሮ፡ምስለ፡ዓበይ[ተ]~።
\end{amharic}

\end{document}


ethiop-linebreaklocale.pdf
Description: Adobe PDF document


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2011-05-11 Thread Jonathan Kew
On 11 May 2011, at 18:02, Arthur Reutenauer wrote:

 And another nasty issue (that might deserve its own thread). We wanted
 to have no hyphenchar at all, but using \hyphenchar\font=0 has a nasty
 consequence that lines with broken words are not properly justified
 (some extra space is squeezed between the last character in line and
 the non-existent hyphen char).
 
  Actually, what we've observed is that XeTeX seems to produce a glyph
 in the output even if we set \hyphenchar to a character for which there
 is no glyph in the current font.

That doesn't surprise me; I'd expect you to get the font's .notdef glyph (which 
might be a blank space, as in this example, or a box, or some other symbol). 
What you want is a character that has a zero-width, invisible glyph; if the 
font supports any of the Unicode characters such as ZWNBSP or ZWNJ or WJ or 
CGJ, etc., that ought to work. Or character 13 (CR) is a likely bet, too.

JK




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2011-05-11 Thread Arthur Reutenauer
 That doesn't surprise me; I'd expect you to get the font's .notdef glyph 
 (which might be a blank space, as in this example, or a box, or some other 
 symbol).

 Thanks for the explanation, that makes sense.

 What you want is a character that has a zero-width, invisible glyph; if the 
 font supports any of the Unicode characters such as ZWNBSP or ZWNJ or WJ or 
 CGJ, etc., that ought to work.

  Yes, that's what I thought too, but it doesn't provide a font-independent 
solution.

 Or character 13 (CR) is a likely bet, too.

  Note that Mojca remarked that using character 10 (LF) produced the desired 
result in that particular font (Abyssinica SIL).  Is there any reason why one 
would prefer the former over the latter, or why either of these characters 
would be a safer bet in general?  I would have thought that both of them, being 
control characters (sort of), would precisely have no glyph in most fonts; 
after all, who would want to set a glyph for a character that's supposed to 
indicate the end of a line of text?

Arthur
 



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2011-05-10 Thread Mojca Miklavec
On Fri, May 6, 2011 at 19:24, Jonathan Kew wrote:

 For line-breaking after the word separators, doesn't it work to just set

  \XeTeXlinebreaklocale en
  \XeTeXlinebreakskip 0pt plus 1pt

 or similar?

Yes, thanks a lot. This does work. However there are two problems with it:

- Only ETHIOPIC WORDSPACE has BA (Break After) mark while ETHIOPIC
FULL STOP has AL (Alphabetic) mark, so text won't break after the
full stop. This is probably a bug in Unicode standard.

- We cannot control the space before ethiopic wordspace with that,
just the space after it. Without some stretching glue it is impossible
to align/justify text.



And another nasty issue (that might deserve its own thread). We wanted
to have no hyphenchar at all, but using \hyphenchar\font=0 has a nasty
consequence that lines with broken words are not properly justified
(some extra space is squeezed between the last character in line and
the non-existent hyphen char). It took me a while before realizing
that
\hyphenchar\font=10
solves the issue somehow, but I still find that totally weird and I'm
not sure if using number 10 only solved the issue for that particular
font or if that is a stable behaviour for other fonts as well.

I wanted to compare the bahaviour with pdfTeX, but I realized that
pdfTeX doesn't offer any option to really remove the hyphen char; one
can only disable hyphenation with -1 or use a number between 0 and 255
(which usually points to an existing glyph).

Mojca



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2011-05-06 Thread Mojca Miklavec
On Thu, Nov 4, 2010 at 14:42, Adam McCollum wrote:
 Dear list members,
 I've recently drawn up a short document in Ge`ez (classical Ethiopic) using
 Polyglossia and I see that the hyphenation is wrong. As some of you know,
 languages that use the Ethiopic script, including Ge`ez and Amharic, place a
 word divider—it looks somewhat like a thick colon—between each word and two
 of these dividers side by side between sentences; see some Amharic examples
 here. That being the case, a word may be broken at any syllable (the script
 is a syllabary, not an alphabet) at the end of a line, but there is nothing
 corresponding to a hyphen. An additional matter of importance is that no
 line should begin with the single or double word divider. How should this be
 fixed?

Dear Adam,

We have submitted Ethiopic hyphenation patterns to CTAN (and TL) a
while ago, so once you update you TeX Live, it should work out of the
box.

However there is a nasty limitation in XeTeX: words hyphenate only up
to 64 characters, so unless somebody fixes XeTeX, you need other
tricks and workarounds. The code below inserts a breakable space
before every word separator (and thus allows XeTeX to start breaking
the next word from scratch). In addition to that you also need to make
sure that:
- there is no hyphenation character at the end of line
- lines are properly aligned
- you might want (or not) some extra space around word and sentence delimiters

Together with Arthur we created the following working example, but it
would be great if François would include some of that code into
Polyglossia.

If you want to have space around word delimiters, you need to create
some non-breakable space in front of delimiter and some breakable
space after the delimiter. The amount of space might need to be
configurable. My estimates might not be the best ones (0.4 +/- 0.1
em), so feel free to fix to the most suitable values. Apart from that
you might want to have both spaces of equal size (I wasn't sure how to
achieve that).

\documentclass[12pt]{article}
\usepackage{fontspec}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage{amharic}
\newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL}

\newXeTeXintercharclass \ethiletter
\newXeTeXintercharclass \ethispace
\newcount\tmp
\def\setclass[#1-#2]#3{%
  \tmp=#1
  \XeTeXcharclass\tmp=#3
  \loop\ifnum\tmp#2
\advance\tmp by 1
\XeTeXcharclass\tmp=#3
  \repeat}
\setclass[1200-139F]\ethiletter

\XeTeXinterchartokenstate=1
\XeTeXcharclass1361\ethispace
\XeTeXcharclass1362\ethispace

\XeTeXinterchartoks \ethispace \ethiletter = {\egroup\hskip.4em plus
.1em minus .1em}
\XeTeXinterchartoks \ethiletter \ethispace = {\kern.4em\bgroup}

\begin{document}
\title{Sample in Gǝ`ǝz}
\maketitle

% \hsize=8cm

\begin{amharic}

\hyphenchar\font=0
እስመ፡አግዚአብሔር፡አምላክ፡ማእምር፡ውእቱ።እግዚአብሔር፡አስተደወ፡መንብሮ።ወአድከመ፡ቅሥተ፡ኀያላን።ወአቅነቶሙ፡ኀይለ፡ለድኩማን።ጽጉማን፡እክል፡ርኅቡ።ወርኁባን፡ጸግቡ።እስመ፡መካን፡ወለደት፡ሰብዐተ፡ወወለድሰ፡ስእነት፡ወሊደ፡እግዚአብሔር፡ይቀትል፡ወየሐዩ።ያወርድኒ፡ውስተ፡ሲእል፡ወየዐርግ።እግዚአብሔር፡ያነዲ፡ወያብዕል።ያኀስርሂ፡ወያከብር፡ዘያነሥኦ፡እምድር፡ለነዳይ።ከመ፡ያንብሮ፡ምስለ፡ዓበይ[ተ]~።
\end{amharic}

\end{document}

Please let us know if that works the way you want it to work. If you
need a LuaTeX solution, please let us know as well.

Mojca

PS: You could also simply use
\XeTeXinterchartoks \ethiletter \ethiletter = {\hskip0pt}
and thus avoid the need for any hyphenation patterns at all.



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2011-05-06 Thread Jonathan Kew
On 6 May 2011, at 18:03, Mojca Miklavec wrote:

 On Thu, Nov 4, 2010 at 14:42, Adam McCollum wrote:
 Dear list members,
 I've recently drawn up a short document in Ge`ez (classical Ethiopic) using
 Polyglossia and I see that the hyphenation is wrong. As some of you know,
 languages that use the Ethiopic script, including Ge`ez and Amharic, place a
 word divider—it looks somewhat like a thick colon—between each word and two
 of these dividers side by side between sentences; see some Amharic examples
 here. That being the case, a word may be broken at any syllable (the script
 is a syllabary, not an alphabet) at the end of a line, but there is nothing
 corresponding to a hyphen. An additional matter of importance is that no
 line should begin with the single or double word divider. How should this be
 fixed?
 
 Dear Adam,
 
 We have submitted Ethiopic hyphenation patterns to CTAN (and TL) a
 while ago, so once you update you TeX Live, it should work out of the
 box.
 ..


For line-breaking after the word separators, doesn't it work to just set

  \XeTeXlinebreaklocale en
  \XeTeXlinebreakskip 0pt plus 1pt

or similar?

JK




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2011-05-06 Thread Mojca Miklavec
Dear Jonathan,

On Fri, May 6, 2011 at 19:24, Jonathan Kew  wrote:
 On 6 May 2011, at 18:03, Mojca Miklavec wrote:

 On Thu, Nov 4, 2010 at 14:42, Adam McCollum wrote:
 Dear list members,
 I've recently drawn up a short document in Ge`ez (classical Ethiopic) using
 Polyglossia and I see that the hyphenation is wrong. As some of you know,
 languages that use the Ethiopic script, including Ge`ez and Amharic, place a
 word divider—it looks somewhat like a thick colon—between each word and two
 of these dividers side by side between sentences; see some Amharic examples
 here. That being the case, a word may be broken at any syllable (the script
 is a syllabary, not an alphabet) at the end of a line, but there is nothing
 corresponding to a hyphen. An additional matter of importance is that no
 line should begin with the single or double word divider. How should this be
 fixed?

 Dear Adam,

 We have submitted Ethiopic hyphenation patterns to CTAN (and TL) a
 while ago, so once you update you TeX Live, it should work out of the
 box.
 ..


 For line-breaking after the word separators, doesn't it work to just set

  \XeTeXlinebreaklocale en
  \XeTeXlinebreakskip 0pt plus 1pt

Hm. Quite possible. None of us (or at least not me) knew about
linebreaklocale and linebreakskip, or at least didn't quite think of
them. We'll test, thanks a lot for the hint.

What exactly does \XeTeXlinebreaklocale en do? (After all, we need
breaking of Ethiopic text, not English one.) And where is 0pt plus
1pt applied? Between all characters or just at the end? How is end of
line determined?

Interesting enough one of the first hits brings me back to Word
wrapping in Lao:
http://tug.org/pipermail/xetex/2010-April/016331.html
which is also being heavily discussed off-list recently. We are
experiencing exactly the same problem there: too long lines to allow
the hyphenation algorithm to work properly.

We are aware of ICU, but nobody knows how to write ICU code even if
the algorithm is somewhat straightforward.

I hope to have Lao hyphenation patterns ready soon and then we will
try to apply some XeTeXinterchartoks-based breaks between letters that
always start or end a syllable, only hoping that there will be enough
of such letters to cut the remaining text into
shorter-than-64-character sequences.

Is there really no way to increase the limit for hyphenation in XeTeX
from 64 characters to something safer? LuaTeX sets the limit at 256.

Mojca



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2010-11-05 Thread Yves Codet
Hello.

Le 4 nov. 2010 à 21:04, Mojca Miklavec a écrit :

 The problem of colons that may not start a new line has to be solved
 on a different level. You could write like that:
 
 እስመ~፡ አግዚአብሔር~፡ አምላክ~፡ ማእምር~፡ ውእቱ~። እግዚአብሔር~፡ አስተደወ~፡ መንብሮ~። ወአድከመ~፡
 ቅሥተ~፡ ኀያላን~። ወአቅነቶሙ~፡ ኀይለ~፡ ለድኩማን~። ጽጉማን~፡ እክል~፡ ርኅቡ~። ወርኁባን~፡ ጸግቡ~።
 እስመ~፡ መካን~፡ ወለደት~፡ ሰብዐተ~፡ ወወለድሰ~፡ ስእነት~፡ ወሊደ~፡ እግዚአብሔር~፡ ይቀትል~፡ ወየሐዩ~።
 ያወርድኒ~፡ ውስተ~፡ ሲእል~፡ ወየዐርግ~። እግዚአብሔር~፡ ያነዲ~፡ ወያብዕል~። ያኀስርሂ~፡ ወያከብር~፡
 ዘያነሥኦ~፡ እምድር~፡ ለነዳይ~። ከመ~፡ ያንብሮ~፡ ምስለ~፡ ዓበይ[ተ]~።
 
 This works perfectly fine, but you probably don't want to write like
 that. I leave it up to others to solve that problem. The hyphenchar
 can easily be changed to nothing though.

Can't this be treated as double punctuations in French? To get እስመ ፡ you type 
እስመ፡ and polyglossia adds a kerning before the word divider, except when the 
preceding character is itself a word divider.

Regards,

Yves




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2010-11-04 Thread Gareth Hughes
Dear Adam,

Line 7 of gloss-amharic.ldf in the polyglossia package has

  hyphennames={amharic,nohyphenation},

which I take to mean that you'll get no hyphenation wherever 'amharic'
is active. The next line is commented out

  %hyphenmins={2,2},

so I presume that some rules were intended (François?). If the rules are
that hyphenation can occur anywhere, I'm sure this would be fairly
easily to implement.

Gareth.

Adam McCollum wrote:
 Dear list members,
 
 I've recently drawn up a short document in Ge`ez (classical Ethiopic) using
 Polyglossia and I see that the hyphenation is wrong. As some of you know,
 languages that use the Ethiopic script, including Ge`ez and Amharic, place a
 word divider—it looks somewhat like a thick colon—between each word and two
 of these dividers side by side between sentences; see some Amharic examples
 herehttp://books.google.com/books?id=r87yh5z66TECprintsec=frontcoverdq=amharichl=enei=U7TSTIX-Ds2r8AaT6LxFsa=Xoi=book_resultct=book-thumbnailresnum=6ved=0CEwQ6wEwBQ#v=onepageqf=false.
 That being the case, a word may be broken at any syllable (the script is a
 syllabary, not an alphabet) at the end of a line, but there is nothing
 corresponding to a hyphen. An additional matter of importance is that no
 line should begin with the single or double word divider. How should this be
 fixed?
 
 Here is a minimal example:
 
 \documentclass[12pt]{article}
 
 \usepackage{fontspec}
 \usepackage{polyglossia}
 
 \setmainlanguage{english}
 \setotherlanguage{amharic}
 
 \newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL}
 
 \begin{document}
 
 \title{Sample in Gǝ`ǝz}
 \maketitle
 
 \begin{amharic}
 እስመ ፡ አግዚአብሔር ፡ አምላክ ፡ ማእምር ፡ ውእቱ ። እግዚአብሔር ፡ አስተደወ ፡ መንብሮ ። ወአድከመ ፡ ቅሥተ ፡
 ኀያላን ። ወአቅነቶሙ ፡ ኀይለ ፡ ለድኩማን ። ጽጉማን ፡ እክል ፡ ርኅቡ ። ወርኁባን ፡ ጸግቡ ። እስመ ፡ መካን ፡
 ወለደት ፡ ሰብዐተ ፡ ወወለድሰ ፡ ስእነት ፡ ወሊደ ፡ እግዚአብሔር ፡ ይቀትል ፡ ወየሐዩ ። ያወርድኒ ፡ ውስተ ፡ ሲእል
 ፡ ወየዐርግ ። እግዚአብሔር ፡ ያነዲ ፡ ወያብዕል ። ያኀስርሂ ፡ ወያከብር ፡ ዘያነሥኦ ፡ እምድር ፡ ለነዳይ ። ከመ ፡
 ያንብሮ ፡ ምስለ ፡ ዓበይ[ተ] ።
 \end{amharic}
 
 \end{document}
 
 With many thanks in advance for the help,
 
 Adam McCollum, Ph.D.
 Lead Cataloger, Eastern Christian Manuscripts
 Hill Museum  Manuscript Library
 Saint John's University
 P.O. Box 7300
 Collegeville, MN 56321
 
 (320) 363-2075 (phone)
 (320) 363-3222 (fax)
 www.hmml.org
 
 
 
 
 
 
 --
 Subscriptions, Archive, and List information, etc.:
   http://tug.org/mailman/listinfo/xetex

-- 
Gareth Hughes
Doctoral candidate in Syriac studies

Department of Eastern Christianity
Oriental Institute
Pusey Lane
Oxford
OX1 2LE


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2010-11-04 Thread Mojca Miklavec
On Thu, Nov 4, 2010 at 15:53, Gareth Hughes wrote:
 If the rules are
 that hyphenation can occur anywhere, I'm sure this would be fairly
 easily to implement.

I agree. We could add new hyphenation patterns simply listing

\patterns{
   for $x in list_of_syllables
   $x2
}

From http://www.ancientscripts.com/ethiopic.html I read: Each sign is
a syllable (consonant plus vowel), except any sign on the sixth column
(ə) represents either the consonant plus the middle central vowel /ə/
or no vowel at all (in which case it is used as a pure consonant in a
consonant cluster).

This makes it only slightly more complicated, but not that much.

But in any case we need someone who would be willing to test (extensively).

Mojca



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] hyphenation in Ethiopian languages

2010-11-04 Thread Mojca Miklavec
(I'm adding the TeX hyphenation mailing list to recipients; I
apologise for cross-posting. Hyphenation-patterns-related discussion
may continue on hyphenation list (or off-list if needed). XeLaTeX
issues, in particular how not to start the line with
word-or-sentence-separator may stay on the XeTeX list since that's
more or less engine- and polyglossia-related.)

On Thu, Nov 4, 2010 at 15:53, Gareth Hughes wrote:
 Dear Adam,

 Line 7 of gloss-amharic.ldf in the polyglossia package has

  hyphennames={amharic,nohyphenation},

 which I take to mean that you'll get no hyphenation wherever 'amharic'
 is active. The next line is commented out

  %hyphenmins={2,2},

 so I presume that some rules were intended (François?). If the rules are
 that hyphenation can occur anywhere, I'm sure this would be fairly
 easily to implement.

An example of hyphenation patterns is attached. I do not claim that
the patterns work perfectly (they probably don't, but it might be a
starting point). I simply added a number 1 after each valid Unicode
character between U+1200 and U+135A (without removing non-existing
characters in Amharic and without using those from Unicode 6,
2D80–2DDF).

1.) You need to put the file hyph-am.tex into

/usr/local/texlive/2010/texmf-dist/tex/generic/hyph-utf8/patterns/tex/hyph-am.tex
2.) Put loadhyph-am.tex into

/usr/local/texlive/2010/texmf-dist/tex/generic/hyph-utf8/loadhyph/loadhyph-am.tex
3.) Add
amharic loadhyph-am.tex
to
/usr/local/texlive/2010/texmf-var/tex/generic/config/language.dat
4.) Change %hyphenmins={2,2}, into hyphenmins={1,1}, in
/usr/local/texlive/2010/texmf-dist/tex/xelatex/polyglossia/gloss-amharic.ldf
5.) Run
sudo fmtutil-sys --byfmt xelatex


You can also test with the following (keep the rest of document unchanged):

\newdimen\savehsize
\savehsize\hsize
\def\test#1{\endgraf\hsize=1pt\noindent #1\endgraf\hsize=\savehsize}

\begin{amharic}
\test{እስመ ፡ አግዚአብሔር ፡ አምላክ ፡ ማእምር ፡ ውእቱ ። እግዚአብሔር ፡ አስተደወ ፡ መንብሮ ።
ወአድከመ ፡ ቅሥተ ፡ ኀያላን ። ወአቅነቶሙ ፡ ኀይለ ፡ ለድኩማን ። ጽጉማን ፡ እክል ፡ ርኅቡ ። ወርኁባን ፡
ጸግቡ ። እስመ ፡ መካን ፡ ወለደት ፡ ሰብዐተ ፡ ወወለድሰ ፡ ስእነት ፡ ወሊደ ፡ እግዚአብሔር ፡ ይቀትል ፡
ወየሐዩ ። ያወርድኒ ፡ ውስተ ፡ ሲእል ፡ ወየዐርግ ። እግዚአብሔር ፡ ያነዲ ፡ ወያብዕል ። ያኀስርሂ ፡
ወያከብር ፡ ዘያነሥኦ ፡ እምድር ፡ ለነዳይ ። ከመ ፡ ያንብሮ ፡ ምስለ ፡ ዓበይ[ተ] ።}
\end{amharic}

The problem of colons that may not start a new line has to be solved
on a different level. You could write like that:

እስመ~፡ አግዚአብሔር~፡ አምላክ~፡ ማእምር~፡ ውእቱ~። እግዚአብሔር~፡ አስተደወ~፡ መንብሮ~። ወአድከመ~፡
ቅሥተ~፡ ኀያላን~። ወአቅነቶሙ~፡ ኀይለ~፡ ለድኩማን~። ጽጉማን~፡ እክል~፡ ርኅቡ~። ወርኁባን~፡ ጸግቡ~።
እስመ~፡ መካን~፡ ወለደት~፡ ሰብዐተ~፡ ወወለድሰ~፡ ስእነት~፡ ወሊደ~፡ እግዚአብሔር~፡ ይቀትል~፡ ወየሐዩ~።
ያወርድኒ~፡ ውስተ~፡ ሲእል~፡ ወየዐርግ~። እግዚአብሔር~፡ ያነዲ~፡ ወያብዕል~። ያኀስርሂ~፡ ወያከብር~፡
ዘያነሥኦ~፡ እምድር~፡ ለነዳይ~። ከመ~፡ ያንብሮ~፡ ምስለ~፡ ዓበይ[ተ]~።

This works perfectly fine, but you probably don't want to write like
that. I leave it up to others to solve that problem. The hyphenchar
can easily be changed to nothing though.

Mojca

 Adam McCollum wrote:
 Dear list members,

 I've recently drawn up a short document in Ge`ez (classical Ethiopic) using
 Polyglossia and I see that the hyphenation is wrong. As some of you know,
 languages that use the Ethiopic script, including Ge`ez and Amharic, place a
 word divider—it looks somewhat like a thick colon—between each word and two
 of these dividers side by side between sentences; see some Amharic examples
 herehttp://books.google.com/books?id=r87yh5z66TECprintsec=frontcoverdq=amharichl=enei=U7TSTIX-Ds2r8AaT6LxFsa=Xoi=book_resultct=book-thumbnailresnum=6ved=0CEwQ6wEwBQ#v=onepageqf=false.
 That being the case, a word may be broken at any syllable (the script is a
 syllabary, not an alphabet) at the end of a line, but there is nothing
 corresponding to a hyphen. An additional matter of importance is that no
 line should begin with the single or double word divider. How should this be
 fixed?

 Here is a minimal example:

 \documentclass[12pt]{article}

 \usepackage{fontspec}
 \usepackage{polyglossia}

 \setmainlanguage{english}
 \setotherlanguage{amharic}

 \newfontfamily\amharicfont[Script = Ethiopic, Scale = 1.3]{Abyssinica SIL}

 \begin{document}

 \title{Sample in Gǝ`ǝz}
 \maketitle

 \begin{amharic}
 እስመ ፡ አግዚአብሔር ፡ አምላክ ፡ ማእምር ፡ ውእቱ ። እግዚአብሔር ፡ አስተደወ ፡ መንብሮ ። ወአድከመ ፡ ቅሥተ ፡
 ኀያላን ። ወአቅነቶሙ ፡ ኀይለ ፡ ለድኩማን ። ጽጉማን ፡ እክል ፡ ርኅቡ ። ወርኁባን ፡ ጸግቡ ። እስመ ፡ መካን ፡
 ወለደት ፡ ሰብዐተ ፡ ወወለድሰ ፡ ስእነት ፡ ወሊደ ፡ እግዚአብሔር ፡ ይቀትል ፡ ወየሐዩ ። ያወርድኒ ፡ ውስተ ፡ ሲእል
 ፡ ወየዐርግ ። እግዚአብሔር ፡ ያነዲ ፡ ወያብዕል ። ያኀስርሂ ፡ ወያከብር ፡ ዘያነሥኦ ፡ እምድር ፡ ለነዳይ ። ከመ ፡
 ያንብሮ ፡ ምስለ ፡ ዓበይ[ተ] ።
 \end{amharic}

 \end{document}

 With many thanks in advance for the help,

 Adam McCollum, Ph.D.
 Lead Cataloger, Eastern Christian Manuscripts
 Hill Museum  Manuscript Library
 Saint John's University
 P.O. Box 7300
 Collegeville, MN 56321

 (320) 363-2075 (phone)
 (320) 363-3222 (fax)
 www.hmml.org
 --
 Gareth Hughes
 Doctoral candidate in Syriac studies

 Department of Eastern Christianity
 Oriental Institute
 Pusey Lane
 Oxford
 OX1 2LE


hyph-am.tex
Description: TeX document


loadhyph-am.tex
Description: TeX