date:20131204

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

2013-12-04 Thread Khaled Hosny

On Wed, Dec 04, 2013 at 11:50:05PM +0100, Zdenek Wagner wrote:
> 2013/12/4 C. Scott Ananian :
> > 3) Arabic comma instead of English comma in citation [23]. (in both
> > web and XeLaTeX output)
> >
> The engine cannot recognize the context if the language is not tagged,
> the comma will always be displayed using the default language.

The engine does nothing special with the comma, it is simply a different
character and was input as such.

Regards,
Khaled


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

2013-12-04 Thread Khaled Hosny

On Wed, Dec 04, 2013 at 12:31:58AM -0500, C. Scott Ananian wrote:
> On Tue, Dec 3, 2013 at 5:33 PM, Khaled Hosny  wrote:
> > On Tue, Dec 03, 2013 at 01:42:21PM -0500, C. Scott Ananian wrote:
> >> Does XeLaTeX implement the Unicode BiDi algorithm?
> >
> > Short answer: no.
> >
> > I think sample documents (minimal working example) are needed for any
> > useful suggestion.
> 
> 
> Attached are the first 23 references from
> https://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D8%B7%D9%86%D8%A8%D9%88%D9%84#.D9.85.D8.B5.D8.A7.D8.AF.D8.B1
> (the Arabic wikipedia article on Istanbul), as generated by my XeLaTeX
> formatter.
> 
> Things to notice:
> 1) Unicode BiDi algorithm at work in web version in places like
> citation [1], "Statistics of the 2010 Turkey census".  XeLaTeX renders
> this backwards.

You need to explicitly markup LTR and RTL text, e.g. using polyglossia
or bidi. You need this to enable hyphenation as well, citations 12 and
13 have very bad spacing because no hyphenation was enabled, for
example. I guess the tool that generates the TeX file will have to do
that.

> 2) Broken italic for arabic in GNU freefont in citation [2].
> (straightforward to fix)

Please, please, please, never ever use GNU free font for Arabic; it is
the most hideous, crappy and useless un-Arabic font ever created, my
blood boils every time I see it in use.

> 3) Arabic comma instead of English comma in citation [23]. (in both
> web and XeLaTeX output)

Bad input, the input has an Arabic comma, no tricks are done here.

Regards,
Khaled

--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

2013-12-04 Thread Andrew Cunningham

Well first step is implementing and providing ways of using the bidi alg
and its changes in Unicode 6.3, especially being able to leverage off bidi
isolation.

Andrew


On 4 December 2013 20:07, Keith J. Schultz  wrote:

> Hi Scott,
>
> Am 03.12.2013 um 19:42 schrieb C. Scott Ananian :
>
> >
> > But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak
> > directionality of the Unicode BiDi algorithm doesn't seem to be
> > honored (or implemented?) and so the English article titles appear
> > with the individual words in RTL order, which is a mess.  Manually
> > tagging the language of the article title is probably the Right thing,
> > but infeasible for the entire wikipedia.
> Well, without proper tagging you can not expect any system to
> work properly or as expected!
> For most entries a simple script should do the trick to add the
> language tags to the article titles.
>
> Hope this helps
> regards
> Keith.
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
>



-- 
Andrew Cunningham
Project Manager, Research and Development
(Social and Digital Inclusion)
Public Libraries and Community Engagement
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia

Ph: +61-3-8664-7430
Mobile: 0459 806 589
Email: acunning...@slv.vic.gov.au
  lang.supp...@gmail.com

http://www.openroad.net.au/
http://www.mylanguage.gov.au/
http://www.slv.vic.gov.au/


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

2013-12-04 Thread Zdenek Wagner

2013/12/4 C. Scott Ananian :
> The goal is to match the Unicode bidi algorithm, because that is how the web
> page displays and thus how the original author saw the text as they wrote.
> Guessing the proper language tag to use is likely infeasible; note that the
> example given contains titles in Turkish as well as English.  The safest
> option is probably to treat embedded LTR text in an RTL context as 'exotic'
> and not to attempt hyphenation.
>
> I've heard it said that LuaTeX has "better bidi support".  What does that
> mean, exactly? Should I be considering switching?
>   --scott
>
LuaTeX offers various features for the arabic script but support of
indic scripts is missing. If you wish to typeset the texts from Hindi
Wikipedia, LuaTeX cannot be used.

> On Dec 4, 2013 4:08 AM, "Keith J. Schultz"  wrote:
>>
>> Hi Scott,
>>
>> Am 03.12.2013 um 19:42 schrieb C. Scott Ananian :
>>
>> >
>> > But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak
>> > directionality of the Unicode BiDi algorithm doesn't seem to be
>> > honored (or implemented?) and so the English article titles appear
>> > with the individual words in RTL order, which is a mess.  Manually
>> > tagging the language of the article title is probably the Right thing,
>> > but infeasible for the entire wikipedia.
>> Well, without proper tagging you can not expect any system to
>> work properly or as expected!
>> For most entries a simple script should do the trick to add the
>> language tags to the article titles.
>>
>> Hope this helps
>> regards
>> Keith.
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>   http://tug.org/mailman/listinfo/xetex
>
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

2013-12-04 Thread Zdenek Wagner

2013/12/4 C. Scott Ananian :
> On Tue, Dec 3, 2013 at 5:33 PM, Khaled Hosny  wrote:
>> On Tue, Dec 03, 2013 at 01:42:21PM -0500, C. Scott Ananian wrote:
>>> Does XeLaTeX implement the Unicode BiDi algorithm?
>>
>> Short answer: no.
>>
>> I think sample documents (minimal working example) are needed for any
>> useful suggestion.
>
>
> Attached are the first 23 references from
> https://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D8%B7%D9%86%D8%A8%D9%88%D9%84#.D9.85.D8.B5.D8.A7.D8.AF.D8.B1
> (the Arabic wikipedia article on Istanbul), as generated by my XeLaTeX
> formatter.
>
> Things to notice:
> 1) Unicode BiDi algorithm at work in web version in places like
> citation [1], "Statistics of the 2010 Turkey census".  XeLaTeX renders
> this backwards.
> 2) Broken italic for arabic in GNU freefont in citation [2].
> (straightforward to fix)

GNU FreeSerif does not contain arabic in the italic shape, only
regular and bold is supported.

> 3) Arabic comma instead of English comma in citation [23]. (in both
> web and XeLaTeX output)
>
The engine cannot recognize the context if the language is not tagged,
the comma will always be displayed using the default language.

> Item #1 is the one I'd really appreciate suggestions for fixing.
>   --scott
>
> --
>  ( http://cscott.net/ )
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

2013-12-04 Thread C. Scott Ananian

The goal is to match the Unicode bidi algorithm, because that is how the
web page displays and thus how the original author saw the text as they
wrote.  Guessing the proper language tag to use is likely infeasible; note
that the example given contains titles in Turkish as well as English.  The
safest option is probably to treat embedded LTR text in an RTL context as
'exotic' and not to attempt hyphenation.

I've heard it said that LuaTeX has "better bidi support".  What does that
mean, exactly? Should I be considering switching?
  --scott
On Dec 4, 2013 4:08 AM, "Keith J. Schultz"  wrote:

> Hi Scott,
>
> Am 03.12.2013 um 19:42 schrieb C. Scott Ananian :
>
> >
> > But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak
> > directionality of the Unicode BiDi algorithm doesn't seem to be
> > honored (or implemented?) and so the English article titles appear
> > with the individual words in RTL order, which is a mess.  Manually
> > tagging the language of the article title is probably the Right thing,
> > but infeasible for the entire wikipedia.
> Well, without proper tagging you can not expect any system to
> work properly or as expected!
> For most entries a simple script should do the trick to add the
> language tags to the article titles.
>
> Hope this helps
> regards
> Keith.
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex
>


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

2013-12-04 Thread C. Scott Ananian

On Tue, Dec 3, 2013 at 5:33 PM, Khaled Hosny  wrote:
> On Tue, Dec 03, 2013 at 01:42:21PM -0500, C. Scott Ananian wrote:
>> Does XeLaTeX implement the Unicode BiDi algorithm?
>
> Short answer: no.
>
> I think sample documents (minimal working example) are needed for any
> useful suggestion.

Attached are the first 23 references from
https://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D8%B7%D9%86%D8%A8%D9%88%D9%84#.D9.85.D8.B5.D8.A7.D8.AF.D8.B1
(the Arabic wikipedia article on Istanbul), as generated by my XeLaTeX
formatter.

Things to notice:
1) Unicode BiDi algorithm at work in web version in places like
citation [1], "Statistics of the 2010 Turkey census".  XeLaTeX renders
this backwards.
2) Broken italic for arabic in GNU freefont in citation [2].
(straightforward to fix)
3) Arabic comma instead of English comma in citation [23]. (in both
web and XeLaTeX output)

Item #1 is the one I'd really appreciate suggestions for fixing.
  --scott

-- 
 ( http://cscott.net/ )

arabic-sm.tex
Description: TeX document

--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

2013-12-04 Thread Keith J. Schultz

Hi Scott,

Am 03.12.2013 um 19:42 schrieb C. Scott Ananian :

> 
> But in the XeLaTeX/polyglossia/bidi output, the "soft space" weak
> directionality of the Unicode BiDi algorithm doesn't seem to be
> honored (or implemented?) and so the English article titles appear
> with the individual words in RTL order, which is a mess.  Manually
> tagging the language of the article title is probably the Right thing,
> but infeasible for the entire wikipedia.
Well, without proper tagging you can not expect any system to
work properly or as expected!
For most entries a simple script should do the trick to add the 
language tags to the article titles. 

Hope this helps
regards
Keith.


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

Re: [XeTeX] xetex and the unicode bidirectional algorithm.

8 matches

Site Navigation

Mail list logo

Footer information