On Mon, Nov 22, 2010 at 8:05 PM, Arthur Reutenauer <arthur.reutena...@normalesup.org> wrote: >> If Indic scripts hyphenate in the same way in all the languages that >> use the script > > I've seen no evidence to let me think that they do, but I'm happy > about any input. Santhosh, since you obviously used Yves' hyphenation > patterns for Sanskrit as a basis for your files, can you tell us a bit > more about that? I'm curious in particular about the rule "do not break > before a final consonant", which you stripped.
Hi all, As far as I know, for Indian languages, it is true that languages using the same script have same hyphenation patterns. So there should not be a difference between Sanskrit and Hindi(Devanagari script) or Assamese and Bengali(Bengali script). And for Indian scripts, the basic rules are almost same, but not all. Tamil got major differences from Malayalam for example. Arthur, "do not break before a final consonant or cluster" is not valid as far as I know. At least for my mother tongue, Malayalam, I am sure that this rule is not there. For other languages I relied on the inputs from my friends, but did not come through this rule so far. But even then, this rule often get applied when applications set "minimum characters after break" setting that many applications provide. There is one thing to be noted while discussing about having a single pattern file for all Indic scripts. The patterns are used by many applications other than tex, and it is reasonable for them to rely on the system locale or detected script or user supplied language code for finding out which hyphenation rules are to be used. So It is a reasonable use case that one user search for hyphen-ml_IN package in a distro if he want to use Malayalam hyphenation in openoffice. In most popular GNU/Linux distros, there is a metapackage for language support. For eg: language-support-ml installs everything required for Malayalam. For the maintainers of this package, it is easy to link them to particular language hyphenation package. So I don't see much benefit in merging all of them. I think we can compare this with Indic fonts packaging Maintaining happening in linux distros. Debian used to have a ttf-indic-fonts package. Now we have that as a metapackage with dependencies to ttf-malayalam-fonts, ttf-tamil-fonts, ttf-hindi-fonts etc and it makes the maintainers, and bug reporters task easy. ps: The git repo I maintain for Indic hyphenation patterns(http://git.savannah.gnu.org/cgit/smc/hyphenation.git) - upstream repo for fedora, openoffice etc. Thanks Santhosh Thottingal http://thottingal.in -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex