Re: [XeTeX] xetex and the unicode bidirectional algorithm.
On Wed, Dec 04, 2013 at 11:50:05PM +0100, Zdenek Wagner wrote: > 2013/12/4 C. Scott Ananian : > > 3) Arabic comma instead of English comma in citation [23]. (in both > > web and XeLaTeX output) > > > The engine cannot recognize the context if the language is not tagged, > the comma will always be displayed using the default language. The engine does nothing special with the comma, it is simply a different character and was input as such. Regards, Khaled -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] xetex and the unicode bidirectional algorithm.
On Wed, Dec 04, 2013 at 12:31:58AM -0500, C. Scott Ananian wrote: > On Tue, Dec 3, 2013 at 5:33 PM, Khaled Hosny wrote: > > On Tue, Dec 03, 2013 at 01:42:21PM -0500, C. Scott Ananian wrote: > >> Does XeLaTeX implement the Unicode BiDi algorithm? > > > > Short answer: no. > > > > I think sample documents (minimal working example) are needed for any > > useful suggestion. > > > Attached are the first 23 references from > https://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D8%B7%D9%86%D8%A8%D9%88%D9%84#.D9.85.D8.B5.D8.A7.D8.AF.D8.B1 > (the Arabic wikipedia article on Istanbul), as generated by my XeLaTeX > formatter. > > Things to notice: > 1) Unicode BiDi algorithm at work in web version in places like > citation [1], "Statistics of the 2010 Turkey census". XeLaTeX renders > this backwards. You need to explicitly markup LTR and RTL text, e.g. using polyglossia or bidi. You need this to enable hyphenation as well, citations 12 and 13 have very bad spacing because no hyphenation was enabled, for example. I guess the tool that generates the TeX file will have to do that. > 2) Broken italic for arabic in GNU freefont in citation [2]. > (straightforward to fix) Please, please, please, never ever use GNU free font for Arabic; it is the most hideous, crappy and useless un-Arabic font ever created, my blood boils every time I see it in use. > 3) Arabic comma instead of English comma in citation [23]. (in both > web and XeLaTeX output) Bad input, the input has an Arabic comma, no tricks are done here. Regards, Khaled -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] xetex and the unicode bidirectional algorithm.
Well first step is implementing and providing ways of using the bidi alg and its changes in Unicode 6.3, especially being able to leverage off bidi isolation. Andrew On 4 December 2013 20:07, Keith J. Schultz wrote: > Hi Scott, > > Am 03.12.2013 um 19:42 schrieb C. Scott Ananian : > > > > > But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak > > directionality of the Unicode BiDi algorithm doesn't seem to be > > honored (or implemented?) and so the English article titles appear > > with the individual words in RTL order, which is a mess. Manually > > tagging the language of the article title is probably the Right thing, > > but infeasible for the entire wikipedia. > Well, without proper tagging you can not expect any system to > work properly or as expected! > For most entries a simple script should do the trick to add the > language tags to the article titles. > > Hope this helps > regards > Keith. > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Andrew Cunningham Project Manager, Research and Development (Social and Digital Inclusion) Public Libraries and Community Engagement State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Australia Ph: +61-3-8664-7430 Mobile: 0459 806 589 Email: acunning...@slv.vic.gov.au lang.supp...@gmail.com http://www.openroad.net.au/ http://www.mylanguage.gov.au/ http://www.slv.vic.gov.au/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] xetex and the unicode bidirectional algorithm.
2013/12/4 C. Scott Ananian : > The goal is to match the Unicode bidi algorithm, because that is how the web > page displays and thus how the original author saw the text as they wrote. > Guessing the proper language tag to use is likely infeasible; note that the > example given contains titles in Turkish as well as English. The safest > option is probably to treat embedded LTR text in an RTL context as 'exotic' > and not to attempt hyphenation. > > I've heard it said that LuaTeX has "better bidi support". What does that > mean, exactly? Should I be considering switching? > --scott > LuaTeX offers various features for the arabic script but support of indic scripts is missing. If you wish to typeset the texts from Hindi Wikipedia, LuaTeX cannot be used. > On Dec 4, 2013 4:08 AM, "Keith J. Schultz" wrote: >> >> Hi Scott, >> >> Am 03.12.2013 um 19:42 schrieb C. Scott Ananian : >> >> > >> > But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak >> > directionality of the Unicode BiDi algorithm doesn't seem to be >> > honored (or implemented?) and so the English article titles appear >> > with the individual words in RTL order, which is a mess. Manually >> > tagging the language of the article title is probably the Right thing, >> > but infeasible for the entire wikipedia. >> Well, without proper tagging you can not expect any system to >> work properly or as expected! >> For most entries a simple script should do the trick to add the >> language tags to the article titles. >> >> Hope this helps >> regards >> Keith. >> >> >> -- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex > > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] xetex and the unicode bidirectional algorithm.
2013/12/4 C. Scott Ananian : > On Tue, Dec 3, 2013 at 5:33 PM, Khaled Hosny wrote: >> On Tue, Dec 03, 2013 at 01:42:21PM -0500, C. Scott Ananian wrote: >>> Does XeLaTeX implement the Unicode BiDi algorithm? >> >> Short answer: no. >> >> I think sample documents (minimal working example) are needed for any >> useful suggestion. > > > Attached are the first 23 references from > https://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D8%B7%D9%86%D8%A8%D9%88%D9%84#.D9.85.D8.B5.D8.A7.D8.AF.D8.B1 > (the Arabic wikipedia article on Istanbul), as generated by my XeLaTeX > formatter. > > Things to notice: > 1) Unicode BiDi algorithm at work in web version in places like > citation [1], "Statistics of the 2010 Turkey census". XeLaTeX renders > this backwards. > 2) Broken italic for arabic in GNU freefont in citation [2]. > (straightforward to fix) GNU FreeSerif does not contain arabic in the italic shape, only regular and bold is supported. > 3) Arabic comma instead of English comma in citation [23]. (in both > web and XeLaTeX output) > The engine cannot recognize the context if the language is not tagged, the comma will always be displayed using the default language. > Item #1 is the one I'd really appreciate suggestions for fixing. > --scott > > -- > ( http://cscott.net/ ) > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] xetex and the unicode bidirectional algorithm.
The goal is to match the Unicode bidi algorithm, because that is how the web page displays and thus how the original author saw the text as they wrote. Guessing the proper language tag to use is likely infeasible; note that the example given contains titles in Turkish as well as English. The safest option is probably to treat embedded LTR text in an RTL context as 'exotic' and not to attempt hyphenation. I've heard it said that LuaTeX has "better bidi support". What does that mean, exactly? Should I be considering switching? --scott On Dec 4, 2013 4:08 AM, "Keith J. Schultz" wrote: > Hi Scott, > > Am 03.12.2013 um 19:42 schrieb C. Scott Ananian : > > > > > But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak > > directionality of the Unicode BiDi algorithm doesn't seem to be > > honored (or implemented?) and so the English article titles appear > > with the individual words in RTL order, which is a mess. Manually > > tagging the language of the article title is probably the Right thing, > > but infeasible for the entire wikipedia. > Well, without proper tagging you can not expect any system to > work properly or as expected! > For most entries a simple script should do the trick to add the > language tags to the article titles. > > Hope this helps > regards > Keith. > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] xetex and the unicode bidirectional algorithm.
On Tue, Dec 3, 2013 at 5:33 PM, Khaled Hosny wrote: > On Tue, Dec 03, 2013 at 01:42:21PM -0500, C. Scott Ananian wrote: >> Does XeLaTeX implement the Unicode BiDi algorithm? > > Short answer: no. > > I think sample documents (minimal working example) are needed for any > useful suggestion. Attached are the first 23 references from https://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D8%B7%D9%86%D8%A8%D9%88%D9%84#.D9.85.D8.B5.D8.A7.D8.AF.D8.B1 (the Arabic wikipedia article on Istanbul), as generated by my XeLaTeX formatter. Things to notice: 1) Unicode BiDi algorithm at work in web version in places like citation [1], "Statistics of the 2010 Turkey census". XeLaTeX renders this backwards. 2) Broken italic for arabic in GNU freefont in citation [2]. (straightforward to fix) 3) Arabic comma instead of English comma in citation [23]. (in both web and XeLaTeX output) Item #1 is the one I'd really appreciate suggestions for fixing. --scott -- ( http://cscott.net/ ) arabic-sm.tex Description: TeX document -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] xetex and the unicode bidirectional algorithm.
Hi Scott, Am 03.12.2013 um 19:42 schrieb C. Scott Ananian : > > But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak > directionality of the Unicode BiDi algorithm doesn't seem to be > honored (or implemented?) and so the English article titles appear > with the individual words in RTL order, which is a mess. Manually > tagging the language of the article title is probably the Right thing, > but infeasible for the entire wikipedia. Well, without proper tagging you can not expect any system to work properly or as expected! For most entries a simple script should do the trick to add the language tags to the article titles. Hope this helps regards Keith. -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex