[XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-06 Thread Andrew Goldstone
Hello: I am attempting to assist a colleague, who is new to TeX, in
typesetting a text which includes many passages in which Burmese and Latin
scripts are closely intermixed. I wanted to make it possible for my
colleague to enter his text fairly naturally, as he is used to doing in
Word, by simply mixing the scripts, rather having to type a macro to switch
languages/fonts at nearly every word. On tex.stackexchange I found a
suggestion to use XeTeX's interchar mechanism for this purpose and adapted
the code example to my own purposes.

Though this works fine on its own, it leads to problems, and sometimes
crashes, in conjunction with two other desirable XeTeX features, namely its
linebreak-locale and interword space-shaping mechanisms. The example below
my signature demonstrates the following three-way interaction:

(A) XeTeXlinebreaklocale="my"
(B) XeTeXinterwordspaceshaping=2
(C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions)

A   some ligatures render incorrectly, e.g. lla လ္ +လ
B   ok, but must use explicit \selectlanguage{burmese}
C   ok, but Burmese lines only broken on spaces (unidiomatic)
A+B ok, but must use explicit \selectlanguage{burmese}
A+C ligature renders incorrectly
B+C segfault if more than one switch to Burmese
A+B+C   segfault if more than one switch to Burmese

My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.95
(TeX Live 2023).

I can certainly help my colleague work around the crashing bug by
postprocessing his source with a script to insert \selectlanguage{} next to
the appropriate Unicode range, but the crash is frustrating. I believe this
is the same issue as was raised on StackExchange in 2019

https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script

but I couldn't find any further discussion of a fix for the crash.

Many thanks for any help: perhaps I've come at this all wrong. My own XeTeX
experience has almost all been in the Latin alphabet. Best,
Andrew Goldstone

PS my example script--forgive the verbosity. The two Burmese words are just
taken at random from my colleague's sample text, with the first repeated to
fill out a line.

\documentclass[draft,12pt]{article}
\usepackage[english]{babel}
\babelprovide[import]{burmese}
\babelfont[burmese]{rm}{Noto Serif Myanmar Regular}

\XeTeXlinebreaklocale "my" % (A)
\XeTeXinterwordspaceshaping=2  % (B)

% (C)...

\newXeTeXintercharclass\burmesesub
\newcount\myCount
\myCount="1000
\loop\ifnum\myCount<"109F
  \XeTeXcharclass\myCount=\burmesesub
  \advance\myCount by 1
\repeat

\XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}}
\XeTeXinterchartoks 4095 \burmesesub = {\begingroup\selectlanguage{burmese}}
\XeTeXinterchartoks \burmesesub 0 = {\endgroup}
\XeTeXinterchartoks \burmesesub 4095 = {\endgroup}

\XeTeXinterchartokenstate=1

% ...(C)

\begin{document}


ထက်လုလ္လ
thak·lulla
ထက်လုလ္လ
thak·lulla
ထက်လုလ္လ
thak·lulla
ထက်လုလ္လ
thak·lulla

သည် ၊ saññ·|

\end{document}


Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-06 Thread Shree Devi Kumar
You can try https://github.com/Pomax/ucharclasses

I have used it in past with Devanagari, Tamil, Gujarati scripts and English.

On Wed, Sep 6, 2023, 11:23 AM Andrew Goldstone 
wrote:

> Hello: I am attempting to assist a colleague, who is new to TeX, in
> typesetting a text which includes many passages in which Burmese and Latin
> scripts are closely intermixed. I wanted to make it possible for my
> colleague to enter his text fairly naturally, as he is used to doing in
> Word, by simply mixing the scripts, rather having to type a macro to switch
> languages/fonts at nearly every word. On tex.stackexchange I found a
> suggestion to use XeTeX's interchar mechanism for this purpose and adapted
> the code example to my own purposes.
>
> Though this works fine on its own, it leads to problems, and sometimes
> crashes, in conjunction with two other desirable XeTeX features, namely its
> linebreak-locale and interword space-shaping mechanisms. The example below
> my signature demonstrates the following three-way interaction:
>
> (A) XeTeXlinebreaklocale="my"
> (B) XeTeXinterwordspaceshaping=2
> (C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions)
>
> A   some ligatures render incorrectly, e.g. lla လ္ +လ
> B   ok, but must use explicit \selectlanguage{burmese}
> C   ok, but Burmese lines only broken on spaces (unidiomatic)
> A+B ok, but must use explicit \selectlanguage{burmese}
> A+C ligature renders incorrectly
> B+C segfault if more than one switch to Burmese
> A+B+C   segfault if more than one switch to Burmese
>
> My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.95
> (TeX Live 2023).
>
> I can certainly help my colleague work around the crashing bug by
> postprocessing his source with a script to insert \selectlanguage{} next to
> the appropriate Unicode range, but the crash is frustrating. I believe this
> is the same issue as was raised on StackExchange in 2019
>
>
> https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script
>
> but I couldn't find any further discussion of a fix for the crash.
>
> Many thanks for any help: perhaps I've come at this all wrong. My own
> XeTeX experience has almost all been in the Latin alphabet. Best,
> Andrew Goldstone
>
> PS my example script--forgive the verbosity. The two Burmese words are
> just taken at random from my colleague's sample text, with the first
> repeated to fill out a line.
>
> \documentclass[draft,12pt]{article}
> \usepackage[english]{babel}
> \babelprovide[import]{burmese}
> \babelfont[burmese]{rm}{Noto Serif Myanmar Regular}
>
> \XeTeXlinebreaklocale "my" % (A)
> \XeTeXinterwordspaceshaping=2  % (B)
>
> % (C)...
>
> \newXeTeXintercharclass\burmesesub
> \newcount\myCount
> \myCount="1000
> \loop\ifnum\myCount<"109F
>   \XeTeXcharclass\myCount=\burmesesub
>   \advance\myCount by 1
> \repeat
>
> \XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}}
> \XeTeXinterchartoks 4095 \burmesesub =
> {\begingroup\selectlanguage{burmese}}
> \XeTeXinterchartoks \burmesesub 0 = {\endgroup}
> \XeTeXinterchartoks \burmesesub 4095 = {\endgroup}
>
> \XeTeXinterchartokenstate=1
>
> % ...(C)
>
> \begin{document}
>
>
> ထက်လုလ္လ
> thak·lulla
> ထက်လုလ္လ
> thak·lulla
> ထက်လုလ္လ
> thak·lulla
> ထက်လုလ္လ
> thak·lulla
>
> သည် ၊ saññ·|
>
> \end{document}
>
>
>
>
>


Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-06 Thread Andrew Goldstone
Thank you for the hint about ucharclasses! That saves my writing the
\XeTeXinterchartoks lines myself and does (rather mysteriously?) seem to
avoid the segfault in conjunction with \XeTeXinterwordspaceshaping=2. The
\XeTeXlinebreaklocale "my" still looks wrong--it breaks a ligature (i.e. a
conjunct consonant) apart at a line break--but this is much closer to what
my colleague needs. Thanks again. Hoping someone may be able to add more
about the Burmese-specific aspect of all this. All best,
Andrew

On Wed, Sep 6, 2023 at 12:33 PM Shree Devi Kumar 
wrote:

> You can try https://github.com/Pomax/ucharclasses
>
> I have used it in past with Devanagari, Tamil, Gujarati scripts and
> English.
>
> On Wed, Sep 6, 2023, 11:23 AM Andrew Goldstone 
> wrote:
>
>> Hello: I am attempting to assist a colleague, who is new to TeX, in
>> typesetting a text which includes many passages in which Burmese and Latin
>> scripts are closely intermixed. I wanted to make it possible for my
>> colleague to enter his text fairly naturally, as he is used to doing in
>> Word, by simply mixing the scripts, rather having to type a macro to switch
>> languages/fonts at nearly every word. On tex.stackexchange I found a
>> suggestion to use XeTeX's interchar mechanism for this purpose and adapted
>> the code example to my own purposes.
>>
>> Though this works fine on its own, it leads to problems, and sometimes
>> crashes, in conjunction with two other desirable XeTeX features, namely its
>> linebreak-locale and interword space-shaping mechanisms. The example below
>> my signature demonstrates the following three-way interaction:
>>
>> (A) XeTeXlinebreaklocale="my"
>> (B) XeTeXinterwordspaceshaping=2
>> (C) XeTeXinterchartokenstate=1 (and accompanying char. class definitions)
>>
>> A   some ligatures render incorrectly, e.g. lla လ္ +လ
>> B   ok, but must use explicit \selectlanguage{burmese}
>> C   ok, but Burmese lines only broken on spaces (unidiomatic)
>> A+B ok, but must use explicit \selectlanguage{burmese}
>> A+C ligature renders incorrectly
>> B+C segfault if more than one switch to Burmese
>> A+B+C   segfault if more than one switch to Burmese
>>
>> My system is macOS 13.5 on Apple M1 Pro, XeTeX 3.141592653-2.6-0.95
>> (TeX Live 2023).
>>
>> I can certainly help my colleague work around the crashing bug by
>> postprocessing his source with a script to insert \selectlanguage{} next to
>> the appropriate Unicode range, but the crash is frustrating. I believe this
>> is the same issue as was raised on StackExchange in 2019
>>
>>
>> https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script
>>
>> but I couldn't find any further discussion of a fix for the crash.
>>
>> Many thanks for any help: perhaps I've come at this all wrong. My own
>> XeTeX experience has almost all been in the Latin alphabet. Best,
>> Andrew Goldstone
>>
>> PS my example script--forgive the verbosity. The two Burmese words are
>> just taken at random from my colleague's sample text, with the first
>> repeated to fill out a line.
>>
>> \documentclass[draft,12pt]{article}
>> \usepackage[english]{babel}
>> \babelprovide[import]{burmese}
>> \babelfont[burmese]{rm}{Noto Serif Myanmar Regular}
>>
>> \XeTeXlinebreaklocale "my" % (A)
>> \XeTeXinterwordspaceshaping=2  % (B)
>>
>> % (C)...
>>
>> \newXeTeXintercharclass\burmesesub
>> \newcount\myCount
>> \myCount="1000
>> \loop\ifnum\myCount<"109F
>>   \XeTeXcharclass\myCount=\burmesesub
>>   \advance\myCount by 1
>> \repeat
>>
>> \XeTeXinterchartoks 0 \burmesesub = {\begingroup\selectlanguage{burmese}}
>> \XeTeXinterchartoks 4095 \burmesesub =
>> {\begingroup\selectlanguage{burmese}}
>> \XeTeXinterchartoks \burmesesub 0 = {\endgroup}
>> \XeTeXinterchartoks \burmesesub 4095 = {\endgroup}
>>
>> \XeTeXinterchartokenstate=1
>>
>> % ...(C)
>>
>> \begin{document}
>>
>>
>> ထက်လုလ္လ
>> thak·lulla
>> ထက်လုလ္လ
>> thak·lulla
>> ထက်လုလ္လ
>> thak·lulla
>> ထက်လုလ္လ
>> thak·lulla
>>
>> သည် ၊ saññ·|
>>
>> \end{document}
>>
>>
>>
>>
>>


Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-06 Thread Werner LEMBERG


> You can try https://github.com/Pomax/ucharclasses

No need to use the version from github.  TeXLive is up to date.


Werner


Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-09 Thread Ulrike Fischer
Am Wed, 6 Sep 2023 10:40:16 -0400 schrieb Andrew Goldstone:

> I believe this is the same issue as was raised on StackExchange in 2019
 
> https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script
 
> but I couldn't find any further discussion of a fix for the crash.

I don't think that there is a fix and the xetex development is
rather stale. Personally I would try with lualatex.


-- 
Ulrike Fischer 
http://www.troubleshooting-tex.de/



Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-09 Thread Andrew Goldstone
Thanks--it turns out that xelatex still segfaults if I attempt to combine
ucharclasses and \XeTeXinterwordspaceshaping=2 in a longer document. I do
think this is a bona fide xetex bug but don't have the knowledge of the
xetex source to trace it further.

As for lualatex it seemed to have more trouble than xelatex with the
complex ligatures in Burmese. The lineation issues are a lower priority for
my colleague than simply being able to typeset his mixed-script text, so
I'll help him to a workaround with xetex, if no other suggestions for fixes
are forthcoming.

All best,
Andrew

Sat, Sep 9, 2023 at 5:37 AM Ulrike Fischer  wrote:

> Am Wed, 6 Sep 2023 10:40:16 -0400 schrieb Andrew Goldstone:
>
> > I believe this is the same issue as was raised on StackExchange in 2019
>
> >
> https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script
>
> > but I couldn't find any further discussion of a fix for the crash.
>
> I don't think that there is a fix and the xetex development is
> rather stale. Personally I would try with lualatex.
>
>
> --
> Ulrike Fischer
> http://www.troubleshooting-tex.de/
>
>


Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-09 Thread Mike Maxwell
Back in 2018, I was trying to use LuaTeX to typeset multiple scripts. 
(We needed its capability to tell you where on the page bounding boxes 
were.)  LuaTeX worked ok for some scripts, but failed for example for 
Tamil, where glyphs don't always appear in the same order on the page as 
their underlying characters do.  This sort of issue arises with many 
Indic scripts, and something similar would probably happen with Burmese, 
which in some ways is an even more complex script than other Indic ones.


At the time, I recall the LuaTeX developers saying they were not 
interested in solving this issue, and that instead script-specific 
libraries should be developed.  (I'm going by memory here, I don't have 
links to that discussion, although see here: 
https://tex.stackexchange.com/questions/.)


Since that time, Khaled Hosny has conducted an "experiment" (his term) 
in using HarfBuzz in LuaTeX 
(https://tug.org/TUGboat/tb40-1/tb124hosny-harfbuzz.pdf, as reported in 
2019), and Kai Eigner also did similar work 
(https://github.com/tatzetwerk/luatex-harfbuzz).  The LuaTeX wikipedia 
page says LuaTeX "includes" the HarfBuzz engine (and links to the above 
two reports).


I haven't tried LuaTeX in recent years, but it sounds like if you ran 
Burmese through it and used the HarfBuzz shaper instead of the 
default(?) shaper, it might work for Burmese.


I'll be interested to hear what you find.

Mike Maxwell

On 9/9/2023 2:29 PM, Andrew Goldstone wrote:
Thanks--it turns out that xelatex still segfaults if I attempt to 
combine ucharclasses and \XeTeXinterwordspaceshaping=2 in a longer 
document. I do think this is a bona fide xetex bug but don't have the 
knowledge of the xetex source to trace it further.


As for lualatex it seemed to have more trouble than xelatex with the 
complex ligatures in Burmese. The lineation issues are a lower priority 
for my colleague than simply being able to typeset his mixed-script 
text, so I'll help him to a workaround with xetex, if no other 
suggestions for fixes are forthcoming.


All best,
Andrew

Sat, Sep 9, 2023 at 5:37 AM Ulrike Fischer > wrote:


Am Wed, 6 Sep 2023 10:40:16 -0400 schrieb Andrew Goldstone:

 > I believe this is the same issue as was raised on StackExchange
in 2019

 >

https://tex.stackexchange.com/questions/503498/trouble-with-stacked-consonants-burmese-script
 


 > but I couldn't find any further discussion of a fix for the crash.

I don't think that there is a fix and the xetex development is
rather stale. Personally I would try with lualatex.


-- 
Ulrike Fischer

http://www.troubleshooting-tex.de/ 



Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-09 Thread Ulrike Fischer
Am Sat, 9 Sep 2023 15:39:51 -0400 schrieb Mike Maxwell:

> I haven't tried LuaTeX in recent years, but it sounds like if you ran 
> Burmese through it and used the HarfBuzz shaper instead of the 
> default(?) shaper, it might work for Burmese.

Yes that should work fine, luahbtex is the default engine for
lualatex since tl 2020 and harfbuzz can be used with latex +
fontspec by using the option Renderer=Harfbuzz

see e.g. https://tex.stackexchange.com/a/515934/2388

-- 
Ulrike Fischer 
http://www.troubleshooting-tex.de/



Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-09 Thread Mike Maxwell

On 9/9/2023 4:20 PM, Ulrike Fischer wrote:

Am Sat, 9 Sep 2023 15:39:51 -0400 schrieb Mike Maxwell:


I haven't tried LuaTeX in recent years, but it sounds like if you ran
Burmese through it and used the HarfBuzz shaper instead of the
default(?) shaper, it might work for Burmese.


Yes that should work fine, luahbtex is the default engine for
lualatex since tl 2020 and harfbuzz can be used with latex +
fontspec by using the option Renderer=Harfbuzz

see e.g. https://tex.stackexchange.com/a/515934/2388


Thank you, that is good news!

At this point, what are the advantages of Xe(La)TeX vs. Lua(La)TeX? 
Apart from the crash Andrew reported, and the staleness of XeTeX's 
development.  Or maybe putting it differently, are there any situations 
that require XeTeX?


   Mike Maxwell



Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-09 Thread Zdenek Wagner
so 9. 9. 2023 v 23:28 odesílatel Mike Maxwell  napsal:
>
> On 9/9/2023 4:20 PM, Ulrike Fischer wrote:
> > Am Sat, 9 Sep 2023 15:39:51 -0400 schrieb Mike Maxwell:
> >
> >> I haven't tried LuaTeX in recent years, but it sounds like if you ran
> >> Burmese through it and used the HarfBuzz shaper instead of the
> >> default(?) shaper, it might work for Burmese.
> >
> > Yes that should work fine, luahbtex is the default engine for
> > lualatex since tl 2020 and harfbuzz can be used with latex +
> > fontspec by using the option Renderer=Harfbuzz
> >
> > see e.g. https://tex.stackexchange.com/a/515934/2388
>
> Thank you, that is good news!
>
> At this point, what are the advantages of Xe(La)TeX vs. Lua(La)TeX?
> Apart from the crash Andrew reported, and the staleness of XeTeX's
> development.  Or maybe putting it differently, are there any situations
> that require XeTeX?
>
According to documentation it seems to me that ucharclasses work only
with XeLaTeX. If I use longer texts, I prefer polyglossia because I
need hyphenation. However, if I just want to insert single words, it
is simpler with ucharclasses.

> Mike Maxwell
>

Zdeněk Wagner
https://www.zdenek-wagner.eu/



Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-10 Thread Javier Bezos




According to documentation it seems to me that ucharclasses work only
with XeLaTeX. 


But with babel and lualatex you can switch the font depending
on the script, even with RTL ones, which, if things haven’t
changed, isn’t possible with ucharclasses. See the examples
in p. 44 of the babel manual.

Javier



Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-10 Thread Javier Bezos

> According to documentation it seems to me that ucharclasses work only
> with XeLaTeX.

But with babel and lualatex you can switch the font depending
on the script, even with RTL ones, which, if things haven’t
changed, isn’t possible with ucharclasses. See the examples
in p. 44 of the babel manual.

Javier


Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-10 Thread Zdenek Wagner
ne 10. 9. 2023 v 15:21 odesílatel Javier Bezos  napsal:
>
>
> > According to documentation it seems to me that ucharclasses work only
> > with XeLaTeX.
>
> But with babel and lualatex you can switch the font depending
> on the script, even with RTL ones, which, if things haven’t
> changed, isn’t possible with ucharclasses. See the examples
> in p. 44 of the babel manual.
>
I can do the same with polyglossia both with xelatex and lualatex but
imagine that I am writing a document in Hindi and from time to time it
contains a single word in English, Russian, Urdu, Gujarati and it may
be loaded from another file. I just do not want to write \textenglish,
\textrussian, \texturdu, \textgujarati. With RTL inside LTR
ucharclasses works for single words, with two or more words it will be
wrong.

> Javier
>

Zdeněk Wagner
https://www.zdenek-wagner.eu/



Re: [XeTeX] xetex crash: interaction between interchar and linebreaklocale mechanisms

2023-09-10 Thread Javier Bezos




I can do the same with polyglossia both with xelatex and lualatex but
imagine that I am writing a document in Hindi and from time to time it
contains a single word in English, Russian, Urdu, Gujarati and it may
be loaded from another file. I just do not want to write \textenglish,
\textrussian, \texturdu, \textgujarati. With RTL inside LTR
ucharclasses works for single words, with two or more words it will be
wrong.


My point is with babel + LuaTeX you don’t need any macro
to switch the font, the text direction and the line breaking
rules, which is what you want. Things like \textenglish,
\textrussian, \texturdu, \textgujarati, etc., are not
necessary for a few words and short texts, even if the
script is RTL. The example in the babel manual is:


\documentclass{book}

\usepackage[english, bidi=basic]{babel}

\babelprovide[onchar=ids fonts]{arabic}

\babelfont{rm}{Crimson} % Main font
\babelfont[*arabic]{rm}{FreeSerif} % Font for the Arabic script

\begin{document}

Most Arabic speakers consider the two varieties to be two registers
of one language, although the two registers can be referred to in
Arabic as فصحى العصر \textit{fuṣḥā l-ʻaṣr} (MSA) and
فصحى التراث \textit{fuṣḥā t-turāth} (CA).

\end{document}


Javier