Re: [XeTeX] Hyphenated, transliterated Sanskrit.
thanks. I've signed up. On 25 November 2010 11:35, Arthur Reutenauer < arthur.reutena...@normalesup.org> wrote: > > Should we have a separate list for this sort of thing? > > There is the tex-hyphen list (http://tug.org/mailman/listinfo/tex-hyphen > ); > this kind of discussion is certainly welcome there. > >Arthur > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
> Should we have a separate list for this sort of thing? There is the tex-hyphen list (http://tug.org/mailman/listinfo/tex-hyphen); this kind of discussion is certainly welcome there. Arthur -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
Sanskrit ka-rman -> kar-man Should we have a separate list for this sort of thing? Dominik -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
On Tue, Nov 23, 2010 at 01:00, Manuel B. wrote: > >>> If Indic scripts hyphenate in the same way in all the languages that >>> use the script > >>I've seen no evidence to let me think that they do, but I'm happy >>about any input. > > Hmm... I think this discussion could be brought to an end more quickly > by falsification: we need an example of two Indian languages with > different hyphenation rules in the same script. We don't really need to elaborate any further unless somebody wants to typeset in a language that is not supported yet. The author of hyphenation patterns says: On Mon, Nov 22, 2010 at 17:11, Santhosh Thottingal wrote: > > As far as I know, for Indian languages, it is true that languages > using the same script have same hyphenation patterns. So there should > not be a difference between Sanskrit and Hindi(Devanagari script) or > Assamese and Bengali(Bengali script). > > And for Indian scripts, the basic rules are almost same, but not all. > Tamil got major differences from Malayalam for example. I would rather not try to be too clever and do modifications on my own. At the moment there are at most two languages with the same patterns, even though there are probably more of them that are not yet supported by Polyglossia. I would say: once we get requests to support another dozen of languages written in the same script, we may start thinking about using per-script patterns to reduce the number of preloaded languages. Mojca -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
>But first of all the question: what would be the biggest benefit? New >languages? My idea was, that the biggest benefit of a single hyphenation file for several Indic scripts could be, that it is possibly easier to maintain. Only one file has to be updated if a change in the pattern is necessary, not many. But I'm ready to admit, that this view of things might be naiive. I think Arthur has a good point in saying that it is probably not worth the effort to merge the hyphenation files into one. And I didn't know that there is a correspondence to the OOo hyphenation files. In that case I absolutely agree, that this correspondence should be preserved, despite the duplication of identical data. >> If Indic scripts hyphenate in the same way in all the languages that >> use the script >I've seen no evidence to let me think that they do, but I'm happy >about any input. Hmm... I think this discussion could be brought to an end more quickly by falsification: we need an example of two Indian languages with different hyphenation rules in the same script. Cheers, Manuel 2010/11/22 BPJ : > 2010-11-22 18:24, Dominik Wujastyk skrev: >> >> Those who write both transliterated Hindi and Sanskrit in the >> same publication will be glad of the ISO standard, I suppose. > > You have the problem in transliterated Hindi on its own, since > both graphemes occur there. In fact they are in complementary > distribution, and in a way which would be easy to automatize, > but being different graphemes they should be transliterated > differently. Retransliteration shouldn't require linguistic > analysis. > >> Typical standard's work: result of a committee that has a >> certain limited logic to it, but pays not enough attention to >> usage amongst professional groups, and consequently leaves >> nobody actually happy. > > Agreed. I'm definitely not a friend of standards for > standards' sake, but that applies to century-old standards > founded by people not considering modern languages too! > Of course you _can_ use different transliterations for Sanskrit and Hindi, > but IMHO transliteration should be by script and not > by language. But let's be thankful nobody came up with d̤ for ड़ > since IPA uses d̤ for ध! > > /bpj > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
On Mon, Nov 22, 2010 at 8:05 PM, Arthur Reutenauer wrote: >> If Indic scripts hyphenate in the same way in all the languages that >> use the script > > I've seen no evidence to let me think that they do, but I'm happy > about any input. Santhosh, since you obviously used Yves' hyphenation > patterns for Sanskrit as a basis for your files, can you tell us a bit > more about that? I'm curious in particular about the rule "do not break > before a final consonant", which you stripped. Hi all, As far as I know, for Indian languages, it is true that languages using the same script have same hyphenation patterns. So there should not be a difference between Sanskrit and Hindi(Devanagari script) or Assamese and Bengali(Bengali script). And for Indian scripts, the basic rules are almost same, but not all. Tamil got major differences from Malayalam for example. Arthur, "do not break before a final consonant or cluster" is not valid as far as I know. At least for my mother tongue, Malayalam, I am sure that this rule is not there. For other languages I relied on the inputs from my friends, but did not come through this rule so far. But even then, this rule often get applied when applications set "minimum characters after break" setting that many applications provide. There is one thing to be noted while discussing about having a single pattern file for all Indic scripts. The patterns are used by many applications other than tex, and it is reasonable for them to rely on the system locale or detected script or user supplied language code for finding out which hyphenation rules are to be used. So It is a reasonable use case that one user search for hyphen-ml_IN package in a distro if he want to use Malayalam hyphenation in openoffice. In most popular GNU/Linux distros, there is a metapackage for language support. For eg: language-support-ml installs everything required for Malayalam. For the maintainers of this package, it is easy to link them to particular language hyphenation package. So I don't see much benefit in merging all of them. I think we can compare this with Indic fonts packaging Maintaining happening in linux distros. Debian used to have a ttf-indic-fonts package. Now we have that as a metapackage with dependencies to ttf-malayalam-fonts, ttf-tamil-fonts, ttf-hindi-fonts etc and it makes the maintainers, and bug reporters task easy. ps: The git repo I maintain for Indic hyphenation patterns(http://git.savannah.gnu.org/cgit/smc/hyphenation.git) - upstream repo for fedora, openoffice etc. Thanks Santhosh Thottingal http://thottingal.in -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
2010-11-22 18:24, Dominik Wujastyk skrev: Those who write both transliterated Hindi and Sanskrit in the same publication will be glad of the ISO standard, I suppose. You have the problem in transliterated Hindi on its own, since both graphemes occur there. In fact they are in complementary distribution, and in a way which would be easy to automatize, but being different graphemes they should be transliterated differently. Retransliteration shouldn't require linguistic analysis. Typical standard's work: result of a committee that has a certain limited logic to it, but pays not enough attention to usage amongst professional groups, and consequently leaves nobody actually happy. Agreed. I'm definitely not a friend of standards for standards' sake, but that applies to century-old standards founded by people not considering modern languages too! Of course you _can_ use different transliterations for Sanskrit and Hindi, but IMHO transliteration should be by script and not by language. But let's be thankful nobody came up with d̤ for ड़ since IPA uses d̤ for ध! /bpj -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
Sanskritists have been using ṛ (r-underdot) for over a century. Promulgating a new standard that changes this usage to r-undercircle is far from being an obvious choice, in my view. But we're irrevocably lumbered with it now. :-( Though I note that most Sanskritists pay no attention to the ISO standard, and continue with IAST, which has been standard in professional journals and book publications since the nineteenth century. Of course Hindi flap and Sanskrit vocalic-r have to be distinguished, but the long-established uniform usage of Sanskritists, present in literally thousands of publications, should have been given greater weight. Most Sanskritists view m-overdot (for anusvāra) as obsolete usage, weakly referential to the Nāgarī orthography, and now strongly deprecated. Again, it isn't used in any professional publications, and hasn't been for a hundred years or more. Those who write both transliterated Hindi and Sanskrit in the same publication will be glad of the ISO standard, I suppose. Typical standard's work: result of a committee that has a certain limited logic to it, but pays not enough attention to usage amongst professional groups, and consequently leaves nobody actually happy. Dominik On 22 November 2010 18:03, BPJ wrote: > 2010-11-21 10:22, Manuel B. skrev: > > 1) I saw that that all diacritics used for IAST appear in the pattern, >> while some of them (for example ṛ and ṝ) are marked as "non standart >> transliteration". That is OK, insofar as IAST is not a standart in the >> official sense. But IAST is most commonly used and the "standart" >> transliteration of vocalic r in IAST is ṛ, not r̥. >> > > The problem is that since for Hindi and other modern > Indic languages ṛ is used for the retroflex flap > -- ḍ with underdot in Nagari -- modeled on the > Urdu letter for that sound. In a strict > transliteration you need a way to distinguish > between the two, and between ri and r̥. Since > Indo-Europeanists have been using r̥ for over a > century that's obviously the best choice. > > /bpj > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
2010-11-21 10:22, Manuel B. skrev: 1) I saw that that all diacritics used for IAST appear in the pattern, while some of them (for example ṛ and ṝ) are marked as "non standart transliteration". That is OK, insofar as IAST is not a standart in the official sense. But IAST is most commonly used and the "standart" transliteration of vocalic r in IAST is ṛ, not r̥. The problem is that since for Hindi and other modern Indic languages ṛ is used for the retroflex flap -- ḍ with underdot in Nagari -- modeled on the Urdu letter for that sound. In a strict transliteration you need a way to distinguish between the two, and between ri and r̥. Since Indo-Europeanists have been using r̥ for over a century that's obviously the best choice. /bpj -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
Le 22 nov. 2010 à 14:23, Arthur Reutenauer a écrit : >> Debatable, I'm not sure :) Gustibus et coloribus non est disputandum. >> Personally I don't mind breaks such as a-rhasi. > > Well, it's not only a matter of taste: in that case, it looked > incorrect to Dominik, to the point that he thought something was wrong > with his installation; which is somewhat problematic. I'll correct that. Please remember those patterns for transliteration are only tentative and the last message Dominik sent shows there's still a lot of work to do. >> I know many prefer ar-hasi, but there are some books where you would find >> a-rhasi. On page 189 of Gray's edition of Vāsavadattā (Delhi, 1962), for >> instance, I can see: ...nirmu-kta..., ...ku-ṭṭimam. > > As the author of the pattern file, it's obviously up to you to decide > which to choose if both solutions are used in books. > >> So, for a start, I did exactly what Arthur described, I chose the easy way. >> But I can add rules allowing a break after the first consonant of a >> consonant cluster. If there are rules such as: >> a1 >> ... >> r3h >> you should get ar-hasi rather than a-rhasi without having to modify >> hyphenmins. > > The one thing one shouldn't do would be to allow both options at the > same time. *That* would be bad taste :-) But if you're happy with > switching, I'm all for it. Would this be better taste? :) .a2 a1 ... r1h Best wishes, Yves -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
> If Indic scripts hyphenate in the same way in all the languages that > use the script I've seen no evidence to let me think that they do, but I'm happy about any input. Santhosh, since you obviously used Yves' hyphenation patterns for Sanskrit as a basis for your files, can you tell us a bit more about that? I'm curious in particular about the rule "do not break before a final consonant", which you stripped. Arthur -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
Hello, > I'll also add the missing characters, ṁ, ẖ, ḫ and the sign for anudātta > (I think that's all, as far as Sanskrit is concerned). I'll wait for your update :-) > Arthur and Mojca are better qualified than I to answer those questions. What > comes to mind is that such a "total" hyphenation file might rapidly become > difficult to maintain, all the more so as it would require several > maintainers. That's indeed another point, maybe even more important. As Mojca mentioned, all the patterns for Modern Indic scripts come from OpenOffice, and are in fact written by a single person. I believe it is really better if we can keep them in sync with OpenOffice, and reflect modifications that would be made there; and they need to have different files anyway. > Besides, some languages might require special rules, exceptions for instance, > which could be unwanted in another language using the same script. Absolutely. > Arthur and Mojca, what do you think? It's a waste of time. Arthur -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
> 2) That might be a stupid question, but aren't hyphennation patterns > for most Abugida-scripts more or less the same? Yes, more or less. If you check the actual files you'll see that there are some differences between languages that use the same script. There's not much you can do with that, since TeX can only read one list of patterns per language. It's in particular not possible, from within a TeX document, to create a modified hyphenation trie by deleting or inserting from an existing trie. You need a different language. (And you need to load the patterns in ini mode anyway.) You could also imagine to have a master file for each Indic script that would contain the patterns that are needed for all the languages written using that script, and a separate file with additional patterns for each individual languages; but that seems hardly worth the effort, for the reason below. > Lots > of hyphennation patterns have to be duplicated, if they are ordered by > language. While one could have a hyphen-indic.tex instead. You will need a separate file for Sanskrit anyway, since it can be written in many different scripts, and there is not yet a mechanism to switch patterns when switching scripts (it's tied to a language). Hence, you're left with the modern Indic languages. Among those for which we have patterns, there happens to be only two pairs that are written in the same script: Hindi and Marathi (in Devanagari), and Bengali and Assamese (in Bengali); both of which containing less than 100 patterns. It does not seem worth the trouble (although those two pairs are actually exactly identical, so that we could have the same file, thereby saving almost 4 kilobytes in TeX distributions; but I wouldn't know how to name the two common files anyway...) In fact, since the pattern files we have for the different Indic languages basically list all the Unicode characters relevant for their script, plus a few consonant clusters, they all contain about 100 patterns and take up less than 2 kilobytes; apart of course for Sanskrit, for which we have patterns in half a dozen Indic scripts, plus transliteration in Latin (~800 patterns, < 10kb). Balance that with the three different files for German (reformed spelling, old spelling, old spelling in Switzerland) that have each 14000+ patterns and weigh almost 100kb; Norwegian (27000 patterns, ~200kb); and finally Hungarian (>6 patterns, >500kb); and you'll see why I'm not eager to develop a complicated scheme in order to share information between hyphenation patterns that are "more or less" the same. Arthur -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
> Debatable, I'm not sure :) Gustibus et coloribus non est disputandum. > Personally I don't mind breaks such as a-rhasi. Well, it's not only a matter of taste: in that case, it looked incorrect to Dominik, to the point that he thought something was wrong with his installation; which is somewhat problematic. > I know many prefer ar-hasi, but there are some books where you would find > a-rhasi. On page 189 of Gray's edition of Vāsavadattā (Delhi, 1962), for > instance, I can see: ...nirmu-kta..., ...ku-ṭṭimam. As the author of the pattern file, it's obviously up to you to decide which to choose if both solutions are used in books. > So, for a start, I did exactly what Arthur described, I chose the easy way. > But I can add rules allowing a break after the first consonant of a consonant > cluster. If there are rules such as: > a1 > ... > r3h > you should get ar-hasi rather than a-rhasi without having to modify > hyphenmins. The one thing one shouldn't do would be to allow both options at the same time. *That* would be bad taste :-) But if you're happy with switching, I'm all for it. Arthur -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
On 21 November 2010 10:12, Yves Codet wrote: > Debatable, I'm not sure :) Gustibus et coloribus non est disputandum. > Personally I don't mind breaks such as a-rhasi. I know many prefer ar-hasi, > but there are some books where you would find a-rhasi. On page 189 of Gray's > edition of Vāsavadattā (Delhi, 1962), for instance, I can see: > ...nirmu-kta..., ...ku-ṭṭimam. > > So, for a start, I did exactly what Arthur described, I chose the easy way. > But I can add rules allowing a break after the first consonant of a > consonant cluster. If there are rules such as: > a1 > ... > r3h > you should get ar-hasi rather than a-rhasi without having to modify > hyphenmins. > > I cannot think of cases where a line-final single-letter hyphenation like a-rhasi would look good. Even examples with alpha-privative, like a-bheda, - which are at least etymologically justified - don't look good. The trouble here is that of good precedent. We need some roman-script Sanskrit with lots of hyphens that has been typeset by knowledgeable typesetters and looks beautiful. I don't think that exists, or at least, it's not known to me. The biggest romanised corpus I can think of immediately is the Pali Text Society volumes, but of course that's Pali not Sanskrit. And I don't know how good the hyphenation is. I would expect the Clay Sanskrit Library to have good hyphenation; again it's hard to tell, and I don't have all vols. to hand. But in Dezs\H{o}'s *Much Ado About Religion* has a pṛ-cchāmaḥ (p.110) which is pretty ugly, I think, though not impossibly so. The cardinal sin of hyphenating a digraph aspirated consonant is avoided (budd-ha), as far as I can see. I don't have the prose *Daśakumāracarita* which, being prose, should offer more hyphenation cases than verse works. I think we're breaking new ground here, and I think it may take a while for a nice set of hyphenation patterns to settle down. The guidelines surely must include consideration of: 1. etymology - word breaks within compounds (sārva-bhaumas) 2. etymology - prefix, suffix, infix breaks within words (bhav-a-ti bud-dha adhi-kṛtam) 3. euphony - lines shouldn't begin with non-existent initials like rh or mh- (a-rhasi). (Okay, since Pingree's CESS A4, we know there's an author Mhālugi, but how many other words begin with mh-?) Dominik -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
It works. Thanks! I tried \sanskritfont yesterday myself, and it didn't work, but my file was pretty cluttered by that time and who knows what else was in the way. Dominik On 21 November 2010 13:42, Yves Codet wrote: > > Le 21 nov. 2010 à 10:12, Yves Codet a écrit : > > > Dominik, I think you can write \sanskritfont, can’t you? > > I just tried this: > > > \documentclass{article} > \usepackage{fontspec} > \usepackage{polyglossia} > \setdefaultlanguage{sanskrit} > \newfontfamily\sanskritfont{Charis SIL} > \textwidth=0.5cm > > \begin{document} > > \noindent > manum ekāgram āsīnam abhigamya maharṣayaḥ | > > \end{document} > > > It worked by me, with Polyglossia v1.2.0a. > > Best wishes, > > Yves > > > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
On Sun, Nov 21, 2010 at 22:34, Yves Codet wrote: > > Le 21 nov. 2010 à 10:22, Manuel B. a écrit : > >> But I don't know how far one can go here. While IAST is meant >> exclusivly for Sanskrit-transliteration (I know that it's used for >> Pali also, but in a slightly different way), ISO 15919 contains far >> more diacritics, than are needed for the transliteration of Sanskrit. >> It's rather meant as a transliteration of many or most Indian >> languages. Should it be duplicated then in every hyphenation pattern >> of every language in question? >> >> 2) That might be a stupid question, but aren't hyphennation patterns >> for most Abugida-scripts more or less the same? That means the >> hyphennation is rather script dependend, than language dependend. Lots >> of hyphennation patterns have to be duplicated, if they are ordered by >> language. While one could have a hyphen-indic.tex instead. > > Arthur and Mojca are better qualified than I to answer those questions. What > comes to mind is that such a "total" hyphenation file might rapidly become > difficult to maintain, all the more so as it would require several > maintainers. Besides, some languages might require special rules, exceptions > for instance, which could be unwanted in another language using the same > script. > > Arthur and Mojca, what do you think? Hello, Exactly at this point we are discussing whether we should use one-pattern-per-language or one-pattern-per-script for Ethiopic script that has been requested recently on the XeTeX mailing list, but for Ethiopic scripts we have made the first version of patterns by ourselves, so at least I know exactly what is there (which is not the case for Indic scripts). In case of Indic scripts, all I did was fetch the scripts from OpenOffice and repackaged them for use in TeX. There might be a reason for language-dependent ordering in OpenOffice since it applies patterns based on language. Having a single file for patterns in OOo would mean duplicating that same file ten times, I guess. In TeX one can reuse the same file for multiple languages more easily. >From my perspective we are the coordinators & collectors of hyphenation patterns. We are not specialists for every language that is being maintained in our repository which means that we still need someone to create the patterns for the language he/she masters. If Indic scripts hyphenate in the same way in all the languages that use the script, then in principle I have nothing against having a single file that would cover them all, but only if that really brings some benefit and in that case probably somebody else should do it. Does anyone require a language that is not present in repository, but would be covered with a "generic Indic script" hyphenation rules? If (for example) the author of OpenOffice files would prepare and maintain the file and thus guarantee compatible behaviour with OOo, that would be the best option. But first of all the question: what would be the biggest benefit? New languages? The rest of thread was talking about Sanskrit. Mojca PS: if any other language specialist could offer some more answers about Ethiopic scripts, feel free to reply to me and Arthur off-list. -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
Hello. Le 21 nov. 2010 à 10:22, Manuel B. a écrit : > While I was checking hyphen-sa.tex, I wondered two things (which are > irrelevant to Dominik's problem): > > 1) I saw that that all diacritics used for IAST appear in the pattern, > while some of them (for example ṛ and ṝ) are marked as "non standart > transliteration". That is OK, insofar as IAST is not a standart in the > official sense. But IAST is most commonly used and the "standart" > transliteration of vocalic r in IAST is ṛ, not r̥. > > The latter belongs to the international standart transliteration of > Indic scripts, defined as ISO 15919. So if ISO 15919 has to be taken > into concern for the Sanskrit hyphenation pattern, it should be done > so completly. Which means, that for example ṁ should also be added, > and ṃ marked as "non standart transliteration", and so on. I agree with you on both points. The comments you mention were merely notes to myself (what we call in French a "pense-bête" :), but since they can be read by other people they should be clearer, and I'll use IAST or ISO 15919 instead of "non standard" and (implicitly) "standard". I'll also add the missing characters, ṁ, ẖ, ḫ and the sign for anudātta (I think that's all, as far as Sanskrit is concerned). > But I don't know how far one can go here. While IAST is meant > exclusivly for Sanskrit-transliteration (I know that it's used for > Pali also, but in a slightly different way), ISO 15919 contains far > more diacritics, than are needed for the transliteration of Sanskrit. > It's rather meant as a transliteration of many or most Indian > languages. Should it be duplicated then in every hyphenation pattern > of every language in question? > > 2) That might be a stupid question, but aren't hyphennation patterns > for most Abugida-scripts more or less the same? That means the > hyphennation is rather script dependend, than language dependend. Lots > of hyphennation patterns have to be duplicated, if they are ordered by > language. While one could have a hyphen-indic.tex instead. Arthur and Mojca are better qualified than I to answer those questions. What comes to mind is that such a "total" hyphenation file might rapidly become difficult to maintain, all the more so as it would require several maintainers. Besides, some languages might require special rules, exceptions for instance, which could be unwanted in another language using the same script. Arthur and Mojca, what do you think? Regards, Yves -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
That's extremely helpful! Thank you, Arthur. I've upped the first argument of hyphenmins to 2, which helps a lot for romanisation, but may make the Nagari breaks more difficult. I suppose it's not reasonable to assume that hyphenation parameters will be the same across different scripts. Best, Dominik On 20 November 2010 22:12, Arthur Reutenauer < arthur.reutena...@normalesup.org> wrote: > > I'm really not sure what I'm getting as a result. It looks as if it's > roman > > script being hyphenated as if it were Devanagari. The initial a- of > several > > words, like arhasi, gets separated (a-rhasi), which might just about look > > okay in Nagari, but not in romanisation. Am I actually getting the right > > thing > > You're indeed getting what the patterns say. From what I read in > hyph-sa.tex, the patterns allow breaks after any vowel (but not inside > diphthongs), and forbids them before final consonants or consonant > clusters; and that's about it. It's certainly a debatable choice, but > it does seem like the patterns really aim at mimicking the way (say) > Sanskrit written using Devanagari is hyphenated. You would have to take > this up with Yves. > > > Why do I have to pretend that this is Devanagari (\devanagarifont)? > > This is by design in polyglossia (see gloss-sanskrit.ldf). You would > have to take this up with François. (And I'm the one responsible for > integrating hyph-sa.tex into hyph-utf8. Why does it seem like there is > a French mafia around Sanskrit support in XeTeX? ;-) > >Arthur > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
Hello. Le 20 nov. 2010 à 22:12, Arthur Reutenauer a écrit : >> I'm really not sure what I'm getting as a result. It looks as if it's roman >> script being hyphenated as if it were Devanagari. The initial a- of several >> words, like arhasi, gets separated (a-rhasi), which might just about look >> okay in Nagari, but not in romanisation. Am I actually getting the right >> thing > > You're indeed getting what the patterns say. From what I read in > hyph-sa.tex, the patterns allow breaks after any vowel (but not inside > diphthongs), and forbids them before final consonants or consonant > clusters; and that's about it. It's certainly a debatable choice, but > it does seem like the patterns really aim at mimicking the way (say) > Sanskrit written using Devanagari is hyphenated. You would have to take > this up with Yves. Debatable, I'm not sure :) Gustibus et coloribus non est disputandum. Personally I don't mind breaks such as a-rhasi. I know many prefer ar-hasi, but there are some books where you would find a-rhasi. On page 189 of Gray's edition of Vāsavadattā (Delhi, 1962), for instance, I can see: ...nirmu-kta..., ...ku-ṭṭimam. So, for a start, I did exactly what Arthur described, I chose the easy way. But I can add rules allowing a break after the first consonant of a consonant cluster. If there are rules such as: a1 ... r3h you should get ar-hasi rather than a-rhasi without having to modify hyphenmins. >> Why do I have to pretend that this is Devanagari (\devanagarifont)? > > This is by design in polyglossia (see gloss-sanskrit.ldf). You would > have to take this up with François. (And I'm the one responsible for > integrating hyph-sa.tex into hyph-utf8. Why does it seem like there is > a French mafia around Sanskrit support in XeTeX? ;-) :) Dominik, I think you can write \sanskritfont, can’t you? Best wishes, Yves -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
Le 21 nov. 2010 à 10:12, Yves Codet a écrit : > Dominik, I think you can write \sanskritfont, can’t you? I just tried this: \documentclass{article} \usepackage{fontspec} \usepackage{polyglossia} \setdefaultlanguage{sanskrit} \newfontfamily\sanskritfont{Charis SIL} \textwidth=0.5cm \begin{document} \noindent manum ekāgram āsīnam abhigamya maharṣayaḥ | \end{document} It worked by me, with Polyglossia v1.2.0a. Best wishes, Yves -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
Im glad to here that there is finaly some implementation of roman transliteration in the sanskrit hyphenation pattern. Keep up the good work! While I was checking hyphen-sa.tex, I wondered two things (which are irrelevant to Dominik's problem): 1) I saw that that all diacritics used for IAST appear in the pattern, while some of them (for example ṛ and ṝ) are marked as "non standart transliteration". That is OK, insofar as IAST is not a standart in the official sense. But IAST is most commonly used and the "standart" transliteration of vocalic r in IAST is ṛ, not r̥. The latter belongs to the international standart transliteration of Indic scripts, defined as ISO 15919. So if ISO 15919 has to be taken into concern for the Sanskrit hyphenation pattern, it should be done so completly. Which means, that for example ṁ should also be added, and ṃ marked as "non standart transliteration", and so on. But I don't know how far one can go here. While IAST is meant exclusivly for Sanskrit-transliteration (I know that it's used for Pali also, but in a slightly different way), ISO 15919 contains far more diacritics, than are needed for the transliteration of Sanskrit. It's rather meant as a transliteration of many or most Indian languages. Should it be duplicated then in every hyphenation pattern of every language in question? 2) That might be a stupid question, but aren't hyphennation patterns for most Abugida-scripts more or less the same? That means the hyphennation is rather script dependend, than language dependend. Lots of hyphennation patterns have to be duplicated, if they are ordered by language. While one could have a hyphen-indic.tex instead. Have a nice weekend! Manuel 2010/11/21 Dominik Wujastyk : > That's extremely helpful! Thank you, Arthur. > > I've upped the first argument of hyphenmins to 2, which helps a lot for > romanisation, but may make the Nagari breaks more difficult. I suppose it's > not reasonable to assume that hyphenation parameters will be the same across > different scripts. > > Best, > Dominik > > > On 20 November 2010 22:12, Arthur Reutenauer > wrote: >> >> > I'm really not sure what I'm getting as a result. It looks as if it's >> > roman >> > script being hyphenated as if it were Devanagari. The initial a- of >> > several >> > words, like arhasi, gets separated (a-rhasi), which might just about >> > look >> > okay in Nagari, but not in romanisation. Am I actually getting the right >> > thing >> >> You're indeed getting what the patterns say. From what I read in >> hyph-sa.tex, the patterns allow breaks after any vowel (but not inside >> diphthongs), and forbids them before final consonants or consonant >> clusters; and that's about it. It's certainly a debatable choice, but >> it does seem like the patterns really aim at mimicking the way (say) >> Sanskrit written using Devanagari is hyphenated. You would have to take >> this up with Yves. >> >> > Why do I have to pretend that this is Devanagari (\devanagarifont)? >> >> This is by design in polyglossia (see gloss-sanskrit.ldf). You would >> have to take this up with François. (And I'm the one responsible for >> integrating hyph-sa.tex into hyph-utf8. Why does it seem like there is >> a French mafia around Sanskrit support in XeTeX? ;-) >> >> Arthur >> >> >> -- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex > > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] Hyphenated, transliterated Sanskrit.
> I'm really not sure what I'm getting as a result. It looks as if it's roman > script being hyphenated as if it were Devanagari. The initial a- of several > words, like arhasi, gets separated (a-rhasi), which might just about look > okay in Nagari, but not in romanisation. Am I actually getting the right > thing You're indeed getting what the patterns say. From what I read in hyph-sa.tex, the patterns allow breaks after any vowel (but not inside diphthongs), and forbids them before final consonants or consonant clusters; and that's about it. It's certainly a debatable choice, but it does seem like the patterns really aim at mimicking the way (say) Sanskrit written using Devanagari is hyphenated. You would have to take this up with Yves. > Why do I have to pretend that this is Devanagari (\devanagarifont)? This is by design in polyglossia (see gloss-sanskrit.ldf). You would have to take this up with François. (And I'm the one responsible for integrating hyph-sa.tex into hyph-utf8. Why does it seem like there is a French mafia around Sanskrit support in XeTeX? ;-) Arthur -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex