Re: [XeTeX] I want to commit to xetex, how to make a pull request?

2022-12-13 Thread Joseph Wright

Ah, right.

This looks like cross-engine work, which mainly happens in the TL repo. 
It will need to be ported back to the SourceForge repo at some stage I 
guess.


Personally, I'd just use the TL repo and leave that sort of admin to 
Karl or the people with write access on the SourceForge repo.


Joseph

On 13/12/2022 11:18, Tuff Contender wrote:

Ummm stupid of me. The correct url is
https://www.tug.org/svn/texlive/trunk/Build/source/texk/web2c/xetexdir/xetex.ch?r1=55885=57724
, a commit on 2021-02-13.

On Tue, Dec 13, 2022 at 7:06 PM Joseph Wright <
joseph.wri...@morningstar2.co.uk> wrote:


On 13/12/2022 11:03, Tuff Contender wrote:

On Sat, Dec 10, 2022 at 6:22 PM Joseph Wright <
joseph.wri...@morningstar2.co.uk> wrote:


On 09/12/2022 19:07, Tuff Contender wrote:


On 08/12/2022 09:22, Tuff Contender wrote:

The code on XeTeX - Unicode-based TeX / Code / [bc89c7] (

sourceforge.net)

<https://sourceforge.net/p/xetex/code/ci/master/tree/> seems not up

to

date, since the last modification is on 2020-01-20.
[image: image.png]
Where should I submit a merge request to?


What makes you think it's not up-to-date? (Other than some TL version
strings, I imagine this is the same code as in TL, etc.) That said,

I'd

likely look to send a patch in the first instance to TL, as it tends

to

be the place that any changes actually happen.

I viewed the page https://sourceforge.net/p/xetex/code/ci/master/tree/
and found the last change is on 2020-01-20, here's the snapshot




https://tug.org/pipermail/xetex/attachments/20221208/db8bad7f/attachment-0001.png


Sure: I only meant that as far as I know, there have been no changes in
XeTeX since them. I was wondering why you thought there might be.

Joseph

Sorry for not getting the idea.


The last significant commit in `xetex.ch` is


https://www.tug.org/svn/texlive/trunk/Build/source/texk/web2c/tex.ch?r1=63916=64547

on 2022-09-29, which is much later than the one on sf.




That's tex.ch, not xetex?

Joseph








Re: [XeTeX] I want to commit to xetex, how to make a pull request?

2022-12-13 Thread Joseph Wright

On 13/12/2022 11:03, Tuff Contender wrote:

On Sat, Dec 10, 2022 at 6:22 PM Joseph Wright <
joseph.wri...@morningstar2.co.uk> wrote:


On 09/12/2022 19:07, Tuff Contender wrote:


On 08/12/2022 09:22, Tuff Contender wrote:

The code on XeTeX - Unicode-based TeX / Code / [bc89c7] (

sourceforge.net)

<https://sourceforge.net/p/xetex/code/ci/master/tree/> seems not up to
date, since the last modification is on 2020-01-20.
[image: image.png]
Where should I submit a merge request to?


What makes you think it's not up-to-date? (Other than some TL version
strings, I imagine this is the same code as in TL, etc.) That said, I'd
likely look to send a patch in the first instance to TL, as it tends to
be the place that any changes actually happen.

I viewed the page https://sourceforge.net/p/xetex/code/ci/master/tree/
and found the last change is on 2020-01-20, here's the snapshot


https://tug.org/pipermail/xetex/attachments/20221208/db8bad7f/attachment-0001.png

Sure: I only meant that as far as I know, there have been no changes in
XeTeX since them. I was wondering why you thought there might be.

Joseph

Sorry for not getting the idea.


The last significant commit in `xetex.ch` is
https://www.tug.org/svn/texlive/trunk/Build/source/texk/web2c/tex.ch?r1=63916=64547
on 2022-09-29, which is much later than the one on sf.




That's tex.ch, not xetex?

Joseph


Re: [XeTeX] I want to commit to xetex, how to make a pull request?

2022-12-10 Thread Joseph Wright

On 09/12/2022 19:07, Tuff Contender wrote:


On 08/12/2022 09:22, Tuff Contender wrote:

The code on XeTeX - Unicode-based TeX / Code / [bc89c7] (sourceforge.net)
 seems not up to
date, since the last modification is on 2020-01-20.
[image: image.png]
Where should I submit a merge request to?


What makes you think it's not up-to-date? (Other than some TL version
strings, I imagine this is the same code as in TL, etc.) That said, I'd
likely look to send a patch in the first instance to TL, as it tends to
be the place that any changes actually happen.

I viewed the page https://sourceforge.net/p/xetex/code/ci/master/tree/
and found the last change is on 2020-01-20, here's the snapshot
https://tug.org/pipermail/xetex/attachments/20221208/db8bad7f/attachment-0001.png


Sure: I only meant that as far as I know, there have been no changes in 
XeTeX since them. I was wondering why you thought there might be.


Joseph



Re: [XeTeX] I want to commit to xetex, how to make a pull request?

2022-12-08 Thread Joseph Wright

On 08/12/2022 09:22, Tuff Contender wrote:

The code on XeTeX - Unicode-based TeX / Code / [bc89c7] (sourceforge.net)
 seems not up to
date, since the last modification is on 2020-01-20.
[image: image.png]
Where should I submit a merge request to?


What makes you think it's not up-to-date? (Other than some TL version 
strings, I imagine this is the same code as in TL, etc.) That said, I'd 
likely look to send a patch in the first instance to TL, as it tends to 
be the place that any changes actually happen.



Still there are some other problems related to commits to the project.
1. It is advised to make patches in the `xetex.ch` file, not `xetex.web`.
But I can find changes in both `*tex.web` and `*tex.ch`. So which one am I
supposed to modify?


I think both to keep them in line with each other.


2. In `.ch` files, what's those after "@x" and "@y", are they comments? How
do I write them?


Yes, they are there for humans to understand what's going on.

What changes are you considering suggesting?

Joseph


Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Joseph Wright

On 30/04/2022 23:52, David Carlisle wrote:

Something like this, I think.

[image: image.png]

\documentclass{article}
\usepackage{polyglossia}

\setdefaultlanguage{armenian}
\setmainfont{DejaVu Sans}
\ExplSyntaxOn
\let\tuppercase\text_uppercase:n
\ExplSyntaxOff
\pagestyle{empty}
\begin{document}
Երևան $\rightarrow$ \uppercase{Երևան}

Երևան $\rightarrow$ \tuppercase{Երևան}

\end{document}

David


The next expl3 release will include hy-x-yiwn as a language settings, 
allowing


   \newcommand\tuppercsae{\text_uppercase:n{hy-x-yiwn}}

in David's example - this variant will use the alternative mapping.

Joseph


Re: [XeTeX] Uppercase in Armenian

2022-05-01 Thread Joseph Wright

On 01/05/2022 13:10, Jonathan Kew wrote:

Hi Zdeněk,

Checking the Unicode character database[1], U+0587 is listed as having a 
*compatibility* decomposition to <0565,0582> (not 0587):


0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L; 0565 0582N;

Likewise, the SpecialCasing.txt file[2] that defines case mappings other 
than simple 1:1 substitutions shows the same decomposition for the 
uppercase form:


0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN

So if I understand correctly, what \text_uppercase:n is doing is simply 
implementing what the Unicode standard defines.


If this isn't the appropriate behavior, at least for some locales, I 
believe that will need custom programming at some level, but I don't 
know enough about it to get into any details.


Indeed: we will add support for alternative casing for Arminian to 
\text_uppercase:nn shortly.


Joseph


Re: [XeTeX] Colour specials for XeTeX

2020-08-07 Thread Joseph Wright

On 07/08/2020 16:43, morris roger wrote:

There are much simpler ways of adding colour; see
https://ctan.org/pkg/do-it-yourself-tex
where I include examples using opmac
Roger H-F,
Ottawa


That's still wrappers around the same specials.

Joseph



Re: [XeTeX] A LaTeX Unicode initialization desire/question/suggestion

2020-01-13 Thread Joseph Wright

On 13/01/2020 03:41, Doug McKenna wrote:

| load-unicode-data handles some of the reading, but there is additional
| reading  (see l3unicode.dtx) that is in expl3.sty (in current xelatex
| fomats) but will be preloaded in future releases and in the current
| xelatex-dev release as noted above.


I tried looking at, e.g., l3unicode.dtx, and it's still using TeX (or 
impenetrable LaTeX3 kernel language built on top) to parse the official Unicode 
data files.


For performance reasons, we had to make that part a bit more complex 
than it was originally: at present, it's run during every LuaTeX/XeTeX 
run, and that is a bit of an issue. It's one of the reasons we want to 
pre-load expl3 and dump it into the format.



It's hard for me to imagine how any of that isn't at least an order of magnitude slower 
than scanning through a mere 20K block of bytes with a machine pointer in C, and 
installing into all pertinent character mapping tables every piece of information that 
XeTeX says it's interested in on a per character or per character range basis.  When I 
use the term "preloaded" I'm not talking about parsing anything inside TeX's 
virtual machine using the TeX language (or whatever's built on top of it).


It's not absolutely as fast as it can be in TeX, but it's close. (For 
LuaTeX, a Lua reader would of course be possible and likely faster, but 
then we'd have two code paths to worry about.)


David's point was that the Unicode data is not needed only for the TeX 
internal tables for \uccode, \lccode, \catcode (possibly others). It's 
also needed to cover other Unicode concepts that TeX doesn't know, and 
so have to be coded at the macro level. For example, Unicode case 
changing is not a one-one operation. For the majority of codepoints, one 
can use the TeX \lccode/\uccode values (and avoid needing to hold them 
in TeX macros). Most of this information is in the relatively small file 
SpecialCasing.txt, but there is also the information one needs from 
UnicodeData.txt to cover titlecasing. We did consider 'pre-extracting' 
that data, but it made relatively little difference during a normal TeX 
run, and leaves open the risk of mismatched files. A 'bigger' data set 
required is NFD mappings: they are needed to handle for example Greek 
case changing. TeX doesn't know about NFD, so again one needs some data, 
which again comes from UnicodeData.txt, and again needs to be stored 
somewhere that's not 'pre-defined'.



| A tex primitive that controls a macro set seems to be reversing the
| natural layering, you could test for \jsboxversion (or whatever you
| have) or test that the lccode of some character is already non zero
| or... several other possibilities without introducing a primitive
| here.


The point is that it *isn't* a TeX primitive.  The idea is that it would be a 
primitive specific only to those engines that initialize their character 
mapping tables (\catcode, \lccode, \uccode, etc.) when the interpreter is 
created/launched/whatever, before it ever executes any TeX source code as a 
virtual machine.  My point is that testing for the existence of \Umathcode is 
an inappropriate test for that condition.


Er, it's a primitive, no? Or would be set up a macro that was 
pre-defined by the engine?



But when your engine is just a library linked into another program the lives for a 
long time, perhaps measured in days, and when the user is running multiple jobs from 
the same program, then there ought to be a way to load the format from its source 
code >once<, and have it live in the engine's memory even while job after job 
is executing on top, with a clean-up after each job ends.  This is, after all, 
completely conformant with everyday use of TeX (edit...run job...edit...run job...), 
not to mention every other computer language.  I'm pretty sure that I've architected 
my code to allow this, although it's untested for now.  One step at a time.


Years ago, Jonathan Fine wrote a TeX daemon that could stay running, 
relying on the fact that DVI files don't need to be closed (unlike PDF 
ones). That requires avoiding \end, and he could only support plain TeX 
as that means disabling \csname, so no environments. I assume you are 
not thinking of a 'permanently running TeX job' in that sense?



| As noted above, with latex-dev releases you are still going to need
| the unicode data files to be read using tex macros.


Are these files read more than once, and if so, why?  If not, I don't 
understand why I'm still going to need to read them.


l3unicode reads each one once, as noted above to populate macro data 
storage. Presumably you are not worried about LuaTeX, so don't have to 
think about font loaders (which also need Unicode info, and which is 
handled by LuaTeX in Lua code).



| To be in the core tex macros we would need to have the engine
| incorporated into texlive so that it could be tested as part of our
| test suite and continuous integration tests.


That doesn't make sense to me. 

Re: [XeTeX] [EXT] A LaTeX Unicode initialization desire/question/suggestion

2020-01-13 Thread Joseph Wright

On 13/01/2020 03:41, Doug McKenna wrote:

Phil Taylor wrote:


| So because JSBox is required/designed to incorporate all of XeTeX's
| features, it must (by definition) implement/provide \Umathcode.


Just to be clear, JSBox can eventually incorporate all of XeTeX's features 
(primitives), but does not do so now. It doesn't even incorporate pdfTeX's 
features, but it is set up to. I'm merely adding XeTeX features as necessary to 
get the LaTeX macro library installed and then typeset a LaTeX document 
containing no Unicode at all. The problem is that somewhere in the LaTeX format 
initialization the ability to recognize a Unicode character (as opposed to a 
UTF-8 byte sequence) is equated with the assumption that it's being run under 
XeTeX, and that therefore at least some of XeTeX's features are there and can 
be relied upon at format initialization time.


At present, there are two engines that implement \Umathcode, etc., 'in 
the wild', XeTeX and LuaTeX, and they have (over time) come to an agreed 
position on what core features are available at the macro level. (For 
example, originally XeTeX called it's new primitives \XeTeX... but they 
got renamed to \U... to match LuaTeX.)


They have quite a lot of differences too, but a core subset of features 
is available with both, and that comes about as they offer \Umathcode. 
Almost all of the tests in LaTeX look for the relevant primitive, so for 
example when we want \Uchar we look for it. However, there are as you 
note a few places where finding \Umathcode is by far the easiest marker.


It's quite possible to add additional tests to the core code, provided 
there is a spec or at least some notes on what's available. (For 
example, (u)pTeX for a long time had no docs in English, so things were 
tricky. But there is now a basic manual there to allow those of us who 
do not know Japanese to offer at least some basic support.)



| But could not JSbox perform (or simulate) the following :



| \let \Umathschar = \Umathchar % use British spelling as synonym
| \let \Umathchar = \undefined % inhibit "load-unicode-data.tex"'s special 
treatment of engines that implement \Umathchar
| \input load-unicode-data % since it would seem that you cannot simply skip 
this step
| \let \Umathchar = \Umathschar % restore canonical meaning of \Umathchar


It could, but it's not my code that's issuing "\input load-unicode-data". The reading of 
"load-unicode-data.tex" is embedded within my version of LaTeX's own initialization code, 
and there's no guarantee that elsewhere in that code there isn't some dependence on \Umathchar that 
such a re-definition might interfere with. LaTeX's code has several tests that rely on whether 
|\Umathchar| is defined or not, and even in the latest versions, it is declared that \Umathchar 
existence is the official way to test. Indeed, the latest official comments, as David Carlisle 
brought to my attention in this thread, declare that \Umathchar existence testing is the current 
way to go in all sorts of places.


I think you mean \Umathcode :)

Each place that uses Unicode features does test for this primitive; if 
it exists, we have to-date been able to assume a few additional 
primitives are also available (e-TeX, \Uchar, \Umathchardef) but mainly 
tells us that we can allocate \lccode and \uccode beyond 255.



Here is perhaps a slightly better hack:

If it's acceptable as the very first executable line in latex.ltx (or other format source 
files) to test the catcode value of `{ to determine whether a format has already been 
loaded or not, then it should be acceptable within "load-unicode-data.tex" (or 
the like) to include a similar test to determine whether to proceed with the TeX parse of 
the Unicode data, or to bail because it's presumable that the tables are already 
initialized. For example, the first non-8-bit Unicode character is:

0100;LATIN CAPITAL LETTER A WITH MACRON;Lu;0;L;0041 0304N;LATIN CAPITAL 
LETTER A MACRON;;;0101;

It is safe, I think, to assume that this Unicode character will forever be 
classified as an uppercase letter (with a lowercase mapping value of U+0101).


The test at the start of latex.ltx is about making sure we are in IniTeX 
mode: I'm not sure I'd choose to do that today, but the test is 
long-standing. For load-unicode-data, the idea was partly that there was 
really no issue about checking: unlike formats, that might have hidden 
stuff, here all we are trying to do is get to a known position. That 
links to the second reason I'm slightly wary of a test. As-written, 
load-unicode-data ensures that the \lccode, \uccode and \catcode tables 
are in a state *known to the macro layer*. I know it's slightly strange 
to you, but as a macro programmer I can't 'know' what different engine 
devs might do/change, and I certainly don't know exactly what version of 
UnicodeData.txt you are working from. By doing initialisation without 
checking, I can be sure that we are on a known Unicode version.


To 

Re: [XeTeX] How much time to build LaTeX format for XeTeX

2019-12-05 Thread Joseph Wright

On 06/12/2019 03:15, Doug McKenna wrote:

Given all the parsing of the Unicode character data files during INITEX, and 
all the inputting and creation of the hyphenation trees, how much CPU time 
elapses while building the XeTeX format file for LateX?  I'm going to assume 
that the writing out of the format at the final \dump command is negligible, 
though I don't really know.

- Doug McKenna



'Not very long': of the order of seconds. On my i5 Dell XPS, using an 
Ubuntu VM on top of Windows 10 "time fmtutil-sys -byfmt xelatex" gives


real   0m2.154s
user   0m1.314s
sys0m0.094s

My native system is slower: this is a known issue to do with file access 
on Windows. LuaTeX is about 0.5s faster, I guess largely as it does not 
load hyphenation patterns.


The LaTeX dev formats, which pre-load more data, take a little longer: 
for XeLaTeX-dev


real   0m3.186s
user   0m2.244s
sys0m0.090s

Joseph


Re: [XeTeX] Math class initialization in Unicde-aware engine

2019-12-02 Thread Joseph Wright

On 02/12/2019 17:52, Doug McKenna wrote:

Joseph -

A similar ambiguity occurs later in the README.md file.  It says

- \Umathcode for all letters as TeX class 7 (var)

Does "letters" mean those code points on the TeX side with \catcode 11, or 
those Unicode code points labeled with 'L' in UnicodeData.txt?

If the former, then combining marks (Unicode 'M') should be entered into 
\Umathcode as TeX class 7; if the latter, then presumably not, though it's not 
clear why a math variable name can't have a combining mark.

- Doug McKenna



The former: I've clarified.

Joseph


Re: [XeTeX] Math class initialization in Unicde-aware engine

2019-12-01 Thread Joseph Wright

On 02/12/2019 05:56, Doug McKenna wrote:

- \lccode and/or \uccode for non-letter code points
   for which an upper or lower case mapping is given

The problem with this is that earlier, it is stated that all combining mark 
code points (class code starting with 'M' in the UnicodeData.txt file) are to 
be considered letters (\catcode set to 11).  So there's an ambiguity here that 
needs clearing up.  Does the above apply to combining mark code points or not?


You've read something in that is not in the README ;)

The file says

  - `\catcode` 11 for all combining marks (Unicode class "M")

where I've very deliberately kept the TeX 'side' as what *actually 
happens* (catcode-11), not said they are 'treated as letters', or similar.


I will clarify that 'letter' here means a codepoint with Unicode 
character class "L", and is not linked to the TeX catcode.



It may be that none of the combining marks in the data file have any case 
mappings, but there's no guarantee that is true.  So the question is, if a 
combining mark has an uppercase or lowercase mapping, does that get installed 
in \lccode and/or \uccode?


Yes, or at least would be the case in principle: all code points with 
upper/lower/title properties are set up.



Also, there's a confusing typo ("can"?) in

- \lccode and \uccode for all of class "Lt" (title
   case letters) to the lower can upper case mappings
   (or if not given to the code point itself)

Should "can' be "and/or"?


It is 'and': you need to set lccode and uccode for these code points.

Joseph



Re: [XeTeX] Math class initialization in Unicde-aware engine

2019-11-27 Thread Joseph Wright

On 28/11/2019 00:16, Ross Moore wrote:

If by ignoring you mean removing the character entirely, then that is surely 
not best at all.

Most  N Class (Normal) characters would be simply of the default  \mathord  
class.


That is already the case: it's where IniTeX starts off, chars are 
mathord. So 'nothing to do here'. Also note that some of this 
information is already set from the main Unicode file: it tells us which 
chars are letters.



I’d expect others to be mapped instead into a macro that corresponds to 
something that TeX does support.
e.g.
  space characters for  thinspace, 2-em space, etc.  in  U+2000 – U+200A
can expand into things like:   \, \; \> \quad \qquad  etc.  ( even to 
constructions like  \mskip1mu )


That's not a generic IniTeX thing, I'm afraid. The Unicode data loaders 
are explicitly about setting up the basic data in Unicode TeX engines 
that's held in (primitive) tables. Creating macros is the job of the 
'rest' of the format. Here, presumably you are thinking of making chars 
math-active: that's well out-of-scope for the loader.



After all, this is essentially what happens when pdfTeX reads raw Unicode input.


pdfTeX reads bytes, there's not really much comparison. In IniTeX mode, 
there is not much happening with UTF-8 and pdfTeX: perhaps you are 
thinking of with LaTeX?


Joseph


Re: [XeTeX] Math class initialization in Unicde-aware engine

2019-11-27 Thread Joseph Wright

On 28/11/2019 01:26, Doug McKenna wrote:

Ross wrote:


| If by ignoring you mean removing the character entirely, then that is surely 
not best at all.
|
| Most N Class (Normal) characters would be simply of the default \mathord 
class.


The parsing code in load-unicode-math-classes.tex installs values in the 
\Umathcode table that comport with some rule, which without too much of a close 
look seems to me to be whether the character code math class read from 
MathClass.txt is one of the eight possibilities that parsing code pays 
attention to, out of the 15 possible ones in the file. Therefore it appears to 
me that all entries in MathClass.txt that are marked with, for instance, 'N', 
are ignored with respect to installing any entry in the \Umathcode table.

It may be that such characters in MatClass.txt marked with 'N' take on the 
\mathOrd attribute by default when TeX finds them within math mode, I'm not 
sure without looking at its code.

Doug McKenna


The loader is intended for use in IniTeX mode and so relies on the 
defaults. As you say, characters are already \mathord unless actively 
set to something else.


Joseph



Re: [XeTeX] Math class initialization in Unicde-aware engine

2019-11-27 Thread Joseph Wright

On 27/11/2019 23:20, Doug McKenna wrote:

Another question about Unicode-aware TeX engine (e.g., XeTeX) initialization 
files.

The Unicode Consortium provides a file, MathClass.txt, e.g.,

./texmf-dist/tex/generic/unicode-data/MathClass.txt

It contains a list of lines (and comments).  Field 0 of an entry line is a 
Unicode code point or a range of code points, and field 1 is a single ASCII 
character that declares the Unicode math class to which the code point or range 
of code points belongs.

Comments in that file say that there are (currently) 15 different Unicode math 
class codes:

#   N - Normal - includes all digits and symbols requiring only one form
#   A - Alphabetic
#   B - Binary
#   C - Closing - usually paired with opening delimiter
#   D - Diacritic
#   F - Fence - unpaired delimiter (often used as opening or closing)
#   G - Glyph_Part - piece of large operator
#   L - Large - n-ary or large operator, often takes limits
#   O - Opening - usually paired with closing delimiter
#   P - Punctuation
#   R - Relation - includes arrows
#   S - Space
#   U - Unary - operators that are only unary
#   V - Vary - operators that can be unary or binary depending on context
#   X - Special - characters not covered by other classes

During XeTeX format initialization, the file load-unicode-math-classes.tex in 
that same directory is executed, in order to declare to the engine which 
Unicode code points belong to which TeX math classes.  The comments in that 
file say that the classes it pays attention to are those with the following 
Unicode math codes:

% This file parses MathClass.txt, provided by the Unicode Consortium, and sets
% up the following mapping between Unicode classes and TeX math types
% - "L" (large)   \mathop
% - "B" (binary)  \mathbin
% - "V" (vary)\mathbin
% - "R" (relation)\mathrel
% - "O" (opening) \mathopen
% - "C" (closing) \mathclose
% - "P" (punctuation) \mathpunct
% - "A" (alphabetic)  \mathalpha

That means that there are 7 other Unicode math classes that are unaccounted for.

Unfortunately, the documentation/comments don't say what happens to entries 
having these other Unicode math codes (N, D, F, G, S, U, and X).  Are they 
completely ignored, or are they mapped to one of the other eight codes that 
matches what TeX is interested in or only capable of handing?

I can imagine that the space character, given Unicode math class 'S' in MathClass.txt, is 
ignored during this parse.  But what happens to the '¬' character (U+00AC) ("NOT 
SIGN"), which is assigned 'U' (Unary Operator).  Surely the logical not sign is not 
being ignored during initialization of a Unicode-aware engine, yet the comments in 
load-unicode-math-classes.tex don't say one way or the other, and it appears to me that 
the parsing code is ignoring it.

The ReadMe.md file



is also deficient in answering this question.

TIA,


Er, I thought the README was reasonably clear, ah well!

The other Unicode math classes don't really map directly to TeX ones, so 
they are currently ignored. Suggestions for improvements here are of 
course welcome.


Joseph


Re: [XeTeX] Lowercase Unicode code points in hyphenation patterns

2019-11-24 Thread Joseph Wright

On 24/11/2019 19:42, Joseph Wright wrote:
This has of course come up before, and I'd like to add to the expl3 case 
changers. However, I've not been able to track down any formal statement 
on the case mappings: are they in the UCD, some official publication, ...?


Joseph


Found the appropriate .xml files in the CLDR: see attached.

I plan to make some revisions to the expl3 case changer over the next 
month or two: I'll likely incorporate this information.


Joseph




	
	
		
			
		
	





	
	
		
			
# Copyright (C) 2011-2013, Apple Inc. and others. All Rights Reserved.
# Remove \0301 following Greek, with possible intervening 0308 marks.
::NFD();
# For uppercasing (not titlecasing!) remove all greek accents from greek letters.
# This is done in two groups, to account for canonical ordering.
[:Greek:] [^[:ccc=Not_Reordered:][:ccc=Above:]]*? { [\u0313\u0314\u0301\u0300\u0306\u0342\u0308\u0304] → ;
[:Greek:] [^[:ccc=Not_Reordered:][:ccc=Iota_Subscript:]]*? { \u0345 → ;
::NFC();
::Any-Upper();
			
		
	





	
	
		
			
# Special case for final form of sigma.
::NFD();
# C is preceded by a sequence consisting of a cased letter and then zero or more case-ignorable characters,
# and C is not followed by a sequence consisting of zero or more case-ignorable characters and then a cased letter.
# 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA
# With translit rules, easiest is to handle the negative condition first, mapping in that case to the regular sigma.
Σ } [:case-ignorable:]* [:cased:] → σ;
[:cased:] [:case-ignorable:]* { Σ → ς;
::Any-Lower;
::NFC();
			
		
	



Re: [XeTeX] Lowercase Unicode code points in hyphenation patterns

2019-11-24 Thread Joseph Wright

On 24/11/2019 19:42, Joseph Wright wrote:
This has of course come up before, and I'd like to add to the expl3 case 
changers. However, I've not been able to track down any formal statement 
on the case mappings: are they in the UCD, some official publication, ...?


Joseph


Found the appropriate .xml files in the CLDR: see attached.

I plan to make some revisions to the expl3 case changer over the next 
month or two: I'll likely incorporate this information.


Joseph




	
	
		
			
		
	





	
	
		
			
# Copyright (C) 2011-2013, Apple Inc. and others. All Rights Reserved.
# Remove \0301 following Greek, with possible intervening 0308 marks.
::NFD();
# For uppercasing (not titlecasing!) remove all greek accents from greek letters.
# This is done in two groups, to account for canonical ordering.
[:Greek:] [^[:ccc=Not_Reordered:][:ccc=Above:]]*? { [\u0313\u0314\u0301\u0300\u0306\u0342\u0308\u0304] → ;
[:Greek:] [^[:ccc=Not_Reordered:][:ccc=Iota_Subscript:]]*? { \u0345 → ;
::NFC();
::Any-Upper();
			
		
	





	
	
		
			
# Special case for final form of sigma.
::NFD();
# C is preceded by a sequence consisting of a cased letter and then zero or more case-ignorable characters,
# and C is not followed by a sequence consisting of zero or more case-ignorable characters and then a cased letter.
# 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA
# With translit rules, easiest is to handle the negative condition first, mapping in that case to the regular sigma.
Σ } [:case-ignorable:]* [:cased:] → σ;
[:cased:] [:case-ignorable:]* { Σ → ς;
::Any-Lower;
::NFC();
			
		
	



Re: [XeTeX] Lowercase Unicode code points in hyphenation patterns

2019-11-24 Thread Joseph Wright

On 24/11/2019 18:40, Apostolos Syropoulos via XeTeX wrote:


   On Sunday, November 24, 2019, 4:21:32 AM GMT+2, David Carlisle 
 wrote:
  
  >the lccode tables are set by the macro layer not the engine code, it

reads in The Unicode consortium data file
tex/generic/unicode-data/UnicodeData.txt
and sets the lccode values and catcode values according to the data there.




see



tex/generic/unicode-data/load-unicode-data.tex


Of course these tables are all wrong but this is another problem.For example, 
this table specifies that the capital form of έ is
Έ which is wrong because uppercase letters do not get accents,expect when they 
start a sentence or the name of a person (e.g.,Έλενα). Since the Unicode 
consortium is not going to change this,I have added the correct \uccodes and 
\lccodes in xgreek.sty
Regards,
A.S.


This has of course come up before, and I'd like to add to the expl3 case 
changers. However, I've not been able to track down any formal statement 
on the case mappings: are they in the UCD, some official publication, ...?


Joseph


Re: [XeTeX] [tex-live] Primitive parity, \expanded and \Ucharcat

2018-06-18 Thread Joseph Wright

On 13/05/2018 13:36, Jonathan Kew wrote:

On 13/05/2018 13:15, Joseph Wright wrote:

On 13/05/2018 12:23, Jonathan Kew wrote:

On 13/05/2018 10:57, Joseph Wright wrote:

Hello all,

Modulo any issues that show up in testing, all of the above is now 
done and on my GitHub fork 
(https://github.com/josephwright/texlive-source/tree/Ucharcat: this 
branch has 'all the stuff' on it).


I know that https://github.com/texjporg/tex-jp-build already has a 
branch for \expanded. What's the best way to request 'officially' 
that the changes go into pdfTeX/XeTeX? I can send a .diff to the 
pdfTeX dev list, and put in a pull request on SourceForge for XeTeX, 
if that's best.


Thanks for working on these things, Joseph.

For xetex, a pull request would be the best approach, I think; or if 
it's feasible to do separate PRs for each feature, that would 
probably make reviewing and tracking the changes easier.


Is there documentation of the added features available somewhere, so 
we can more accurately understand what we're thinking of adding? Thanks!


JK


Excellent: I'll start on putting something together later today.

Do you want all PRs against master or can they be 'chained'? Adding 
primitives, it's easier if you are working knowing which others have 
been created.


"Chained" should be fine, I expect; I doubt there'd be any reason we'll 
want to take a later one but decide against an earlier one.


At which point perhaps a single PR is just as good, as long each feature 
is a separate commit so that it comes in manageable chunks. From a quick 
glance at your fork, it looks like that's how it would naturally appear. 
So, feel free to do whichever seems easiest for you.


JK


Hello Jonathan,

Have you been able to look at my merge requests? We are now moving to 
using \expanded for other engines: it's going into pdfTeX and (u)pTeX, 
and is already in LuaTeX. Ideally, we'd like to avoid XeTeX being 'left 
behind'.


Joseph


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [tex-live] Primitive parity, \expanded and \Ucharcat

2018-05-13 Thread Joseph Wright

On 13/05/2018 12:23, Jonathan Kew wrote:

On 13/05/2018 10:57, Joseph Wright wrote:

Hello all,

Modulo any issues that show up in testing, all of the above is now 
done and on my GitHub fork 
(https://github.com/josephwright/texlive-source/tree/Ucharcat: this 
branch has 'all the stuff' on it).


I know that https://github.com/texjporg/tex-jp-build already has a 
branch for \expanded. What's the best way to request 'officially' that 
the changes go into pdfTeX/XeTeX? I can send a .diff to the pdfTeX dev 
list, and put in a pull request on SourceForge for XeTeX, if that's best.


Thanks for working on these things, Joseph.

For xetex, a pull request would be the best approach, I think; or if 
it's feasible to do separate PRs for each feature, that would probably 
make reviewing and tracking the changes easier.


Is there documentation of the added features available somewhere, so we 
can more accurately understand what we're thinking of adding? Thanks!


JK


Excellent: I'll start on putting something together later today.

Do you want all PRs against master or can they be 'chained'? Adding 
primitives, it's easier if you are working knowing which others have 
been created.


Joseph


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Primitive parity, \expanded and \Ucharcat

2018-05-13 Thread Joseph Wright

On 03/05/2018 22:38, Joseph Wright wrote:

Hello all,

In adding features to expl3, the LaTeX team have been making use of a 
variety of 'new' (post-e-TeX) 'utility' primitives in various engines. 
Almost always these originate in pdfTeX and have migrated to other 
engines, but are not in any way tied to PDF output, etc. Depending on 
the exact engine in use, some or all of these primitives may be 
unavailable, and that then limits macro-level features.


It seems sensible long-term to have cross-engine feature stay 'in sync' 
with each other. In particular, (u)pTeX has picked up a number of pdfTeX 
features, meaning that XeTeX often is the most 'limited' engine. The 
team would like, if possible, to have a common feature set in all 
engines in this regard. At the same time, there are a few 'bits and 
pieces' that make sense to raise at the same time. I'll lay out the 
various areas below.


Doing the work here is non-trivial, but luckily there is an automated 
build system available via GitHub which is allowing us (me/David 
Carlisle) to do some testing. I'm building up patches in various 
branches at https://github.com/josephwright/texlive-source: assuming 
these look good, I'll merge them as required and send diff files to 
where/whoever is best. The branches on GitHub should hopefully have 
clear names for what they address.


The areas we are keen to look at are as follows:

- 'pdfutils': (u)pTeX has picked up a number of pdfTeX primitives, and
   a subset have made their way into XeTeX too. However, XeTeX is still
   missing several, most notably an expandable RNG. We are part-way
   though working out patches to add the rest to XeTeX (RNG is done,
   file data and timer to do)

- banners: pdfTeX and LuaTeX have \banner, other engines lack
   that. The banner includes TeX version and details of the TeX system,
   so is potentially useful. Adding this to (u)pTeX/XeTeX looks
   straight forward: still to-do.

- \expanded: This was slated for pdfTeX 1.50 but that has never
   appeared, but the primitive is useful as it allows 'function-like'
   expandable macros. We can see this begin very useful for simplifying
   some macro code, and in many ways it feels like an e-TeX primitive.
   The GitHub expanded branch adds it to pdfTeX/XeTeX/(u)pTeX

- Allowing \Ucharcat (XeTeX) to make \active tokens: this was raised
   recently on the XeTeX list, but fits here as we've put a branch
   together to show it works

It's likely I'll finish the outstanding patches by the weekend. Note 
that at present each feature addition is in a separate Git branch, so to 
add all of them I'll have to do a little tidying up: that will happen 
once I know which of these suggestions are useful.


Feedback most welcome.

Joseph


Hello all,

Modulo any issues that show up in testing, all of the above is now done 
and on my GitHub fork 
(https://github.com/josephwright/texlive-source/tree/Ucharcat: this 
branch has 'all the stuff' on it).


I know that https://github.com/texjporg/tex-jp-build already has a 
branch for \expanded. What's the best way to request 'officially' that 
the changes go into pdfTeX/XeTeX? I can send a .diff to the pdfTeX dev 
list, and put in a pull request on SourceForge for XeTeX, if that's best.


Regards,

Joseph


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Primitive parity, \expanded and \Ucharcat

2018-05-04 Thread Joseph Wright

On 03/05/2018 22:38, Joseph Wright wrote:

- 'pdfutils': (u)pTeX has picked up a number of pdfTeX primitives, and
   a subset have made their way into XeTeX too. However, XeTeX is still
   missing several, most notably an expandable RNG. We are part-way
   though working out patches to add the rest to XeTeX (RNG is done,
   file data and timer to do)


To be clear, the full set of primitives here is

- \pdfrandomseed
- \pdfsetrandomseed
- \pdfuniformdeviate
- \pdfnormaldeviate

- \pdfresettimer
- \pdfelapsedtime

- \pdffilesize
- \pdffilemoddate
- \pdffiledump
- \pdfcreationdate

of which the first set is done (a working branch for XeTeX).

Joseph


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [tex-live] Primitive parity, \expanded and \Ucharcat

2018-05-04 Thread Joseph Wright

Hello Norbert,


I'll merge them as required and send diff files to where/whoever is best.


For (u)ptex a pull request at https://github.com/texjporg/tex-jp-build
might be useful. This is the main development area for all Japanese TeX
engine stuff.


I see that the \expanded code has already been picked up in a branch 
there: https://github.com/texjporg/tex-jp-build/tree/expanded. So I'm 
guessing that is likely to happen.


At the moment what I'm aiming to do is get everything in one place so it 
can be reviewed, etc., and commented on. Particularly in the case of 
\expanded, the wider plan only works if there is general 
(pdfTeX/XeTeX/(u)pTeX) agreement on taking the patch.


Once we have that agreement, putting in pull requests, diffs, etc. 
against the 'right' places should be easy enough (at least in the sense 
I'm happy to sort it).



them I'll have to do a little tidying up: that will happen once I know which
of these suggestions are useful.


I think all are fine.


I don't imagine there is anything particularly controversial, but there 
is also the technical business (I'm no WEB expert).


Joseph


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


[XeTeX] Primitive parity, \expanded and \Ucharcat

2018-05-03 Thread Joseph Wright

Hello all,

In adding features to expl3, the LaTeX team have been making use of a 
variety of 'new' (post-e-TeX) 'utility' primitives in various engines. 
Almost always these originate in pdfTeX and have migrated to other 
engines, but are not in any way tied to PDF output, etc. Depending on 
the exact engine in use, some or all of these primitives may be 
unavailable, and that then limits macro-level features.


It seems sensible long-term to have cross-engine feature stay 'in sync' 
with each other. In particular, (u)pTeX has picked up a number of pdfTeX 
features, meaning that XeTeX often is the most 'limited' engine. The 
team would like, if possible, to have a common feature set in all 
engines in this regard. At the same time, there are a few 'bits and 
pieces' that make sense to raise at the same time. I'll lay out the 
various areas below.


Doing the work here is non-trivial, but luckily there is an automated 
build system available via GitHub which is allowing us (me/David 
Carlisle) to do some testing. I'm building up patches in various 
branches at https://github.com/josephwright/texlive-source: assuming 
these look good, I'll merge them as required and send diff files to 
where/whoever is best. The branches on GitHub should hopefully have 
clear names for what they address.


The areas we are keen to look at are as follows:

- 'pdfutils': (u)pTeX has picked up a number of pdfTeX primitives, and
  a subset have made their way into XeTeX too. However, XeTeX is still
  missing several, most notably an expandable RNG. We are part-way
  though working out patches to add the rest to XeTeX (RNG is done,
  file data and timer to do)

- banners: pdfTeX and LuaTeX have \banner, other engines lack
  that. The banner includes TeX version and details of the TeX system,
  so is potentially useful. Adding this to (u)pTeX/XeTeX looks
  straight forward: still to-do.

- \expanded: This was slated for pdfTeX 1.50 but that has never
  appeared, but the primitive is useful as it allows 'function-like'
  expandable macros. We can see this begin very useful for simplifying
  some macro code, and in many ways it feels like an e-TeX primitive.
  The GitHub expanded branch adds it to pdfTeX/XeTeX/(u)pTeX

- Allowing \Ucharcat (XeTeX) to make \active tokens: this was raised
  recently on the XeTeX list, but fits here as we've put a branch
  together to show it works

It's likely I'll finish the outstanding patches by the weekend. Note 
that at present each feature addition is in a separate Git branch, so to 
add all of them I'll have to do a little tidying up: that will happen 
once I know which of these suggestions are useful.


Feedback most welcome.

Joseph


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Allowing Ucharcat to produce active characters

2018-04-18 Thread Joseph Wright

On 18/04/2018 16:08, Bruno Le Floch wrote:

Hello,

I suggest allowing \Ucharcat to produce active characters.  See
three-line patch attached.  This would allow to produce active
characters expandably in all engines (pdfTeX, luaTeX, XeTeX, pTeX,
upTeX).  My code makes

 \Ucharcat `~ 13
 \expandafter\show\Ucharcat `~ 13
 \edef\foo{\expandafter\noexpand\Ucharcat `~ 13 }

run the code of the active ~ as if it had been typed directly, then show
its meaning, then do the equivalent of \def\foo{~}.

Bruno


In case anyone wonders: only XeTeX has \Ucharcat. In LuaTeX we can make 
char tokens from the 'Lua side', so are unrestricted in terms of 
catcode. In pdfTeX and (u)pTeX, assuming we are only dealing with the 
8-bit range (upTeX) it's feasible to pre-generate all combinations and 
use expandable macros to output the tokens.


Joseph



--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Problem involving \includegraphics after texlive update on Debian Tesing

2017-07-07 Thread Joseph Wright
On 07/07/2017 12:54, Johann Spies wrote:
> After a recent upgrade of texlive to 2017.20170629-1 on Debian I am
> experiencing a problem compiling a longstanding document and I can
> replicate the problem with the following code:
> 
> \documentclass[12pt,a4paper]{article}
> \usepackage{fontspec} % Gebruik met xelatex
> \usepackage{graphicx} % Gebruik met xelatex
> \usepackage[hyperindex=true,colorlinks=true,bookmarks]{hyperref}
> \usepackage{colortbl}
> \setmainfont[Ligatures=TeX,Mapping=tex-text]{Linux Libertine O}
> \begin{document}
> 
> 
> 
> \includegraphics{fruits.jpg}
> 
> \end{document}
> 
> %%% Local Variables: 
> %%% mode: latex
> %%% TeX-engine: xetex
> %%% TeX-master: t
> %%% End: 
> 
> 
> results in
> 
> ERROR: Undefined control sequence.
> 
> --- TeX said ---
> \Ginclude@bmp #1->\Gin@log 
>{<#1>}\bgroup \def \@tempa {!}\special 
> {pdf:image...l.11 \includegraphics{fruits.jpg}
>  
> --- HELP ---
> TeX encountered an unknown command name. You probably misspelled the
> name. If this message occurs when a LaTeX command is being processed,
> the command is probably in the wrong place---for example, the error
> can be produced by an \item command that's not inside a list-making
> environment. The error can also be caused by a missing \documentclass
> command.
> 
> You can replace "fruit.jpg" with any jpg to replicate the problem.
> 
> It will help me if someone on this list can identify what is causing
> this.
> 
> Regards
> Johann

Could you add \listfiles to your input and post the resulting *File
list* from the .log?

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Using tikz with plain XeTeX

2017-05-13 Thread Joseph Wright
On 13/05/2017 12:49, Philip Taylor wrote:
> 
> 
> John Was wrote:
>> Even if PS-Tricks and Tikz do clash, it doesn't seem to be PS-Tricks 
>> specifically that's causing this issue (I've tried commenting it out) - 
>> suspicion currently falls on Edmac, which I use for cropmarks and sometimes 
>> for other purposes (e.g. automatic line numbering of texts when required). I 
>> don't really mind doing without tikz (at least for now), but it would be 
>> good to know the cause of the weird behaviour!
> Then I think you will have to strip down your fault-provoking code to 
> something manageable, John; "necessary and sufficient" is the key -- you have 
> provided the necessary, now it is surely incumbent on you to strip it down to 
> the necessary if others are to be able to help you in finite time.
> 
> Philip Taylor

A minimal example is

\input ulem.sty
\input tikz

\tikzpicture
  \path[draw=red] (0,0) -- (1,1) -- (2,1) circle (10pt);
\endtikzpicture
\bye

with the first piece of text pointing to ifpdf: the issue is not limited
to TikZ. (It doesn't help though that TikZ's emulation of a minimal
LaTeX set up isn't 'self-contained': the load order cannot be reversed
here.)

This allows us to isolate the issue: ulem.sty does

\expandafter\ifx\csname ProvidesPackage\endcsname \relax

which leaves \ProvidesPackage as \relax in plain (there is no grouping).
That's an issue for any code that tests 'quickly' for \ProvidesPackage,
for example in ifpdf.sty

\ifx\ProvidesPackage\undefined

The most obvious solution is to get rid of the problematic definition:

\input ulem.sty
\let\ProvidesPackage\undefined
\input tikz

\tikzpicture
  \path[draw=red] (0,0) -- (1,1) -- (2,1) circle (10pt);
\endtikzpicture
\bye





Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Using tikz with plain XeTeX

2017-05-13 Thread Joseph Wright
On 13/05/2017 11:53, John Was wrote:
> Dear All
> 
> Apologies if this is the wrong list (but I’ve always found participants here 
> very helpful!).
> 
> I have been sent some tikz code for diagrams to be included in a forthcoming 
> article.  The author uses a version of LaTeX but tikz should work OK in plain 
> (Xe)TeX, I think – though I haven’t tried it for a number of years.  Oddly 
> enough, when I invoke tikz with:
> 
> \input tikz
> 
> the package does load, and a simple drawing works:
> 
> \tikzpicture
> \path[draw=red] (0,0) -- (1,1) -- (2,1) circle (10pt);
> \endtikzpicture
> 
> (pasted from a stackexchange discussion of a different matter).
> 
> BUT, before the drawing I get six lines of info in the output (the sort of 
> thing I’d expect in the log), viz.:
> 
> pgfrcs[2010/10/25 v2.10 (rcs-revision 1.24)]
> pgf[2008/01/15 (rcs-revision 1.10)]
> pgfsys[2010/06/30 v2.10 (rcs-revision 1.37)]
> pgfcore[2010/04/11 v2.10 (rcs-revision 1.7)]
> pgffor[2010/03/23 v2.10 (rcs-revision 1.18)]
> tikz[2010/10/13 v2.10 (rcs-revision 1.76)]
> 
> It also messes up my crop marks and running headlines in subsequent pages, 
> but I suspect that could be rectified by invoking other \inputs in a 
> different order (I include edmac and pstricks at the start).  I can manage 
> without tikz if necessary (the worst-case scenario would be redrawing with 
> pstricks), but it would be good to know at least that I can use tikz in 
> future without these unwanted half-dozen lines coming into the output.  It’s 
> a powerful package that I’ve always meant to learn.
> 
> Best
> 
> 
> John

TikZ is certainly loadable with plain. Could you give more details of
your TeX system or perhaps a log for the simple file

\input tikz
\tikzpicture
\path[draw=red] (0,0) -- (1,1) -- (2,1) circle (10pt);
\endtikzpicture
\bye

I get the 'expected' output with both TL'16 final and TL'17 pretesting.

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [tex-live] xetex.def

2017-05-10 Thread Joseph Wright
On 09/05/2017 23:13, Karl Berry wrote:
> what is the reason for having two .def files here.
> 
> I can't imagine an insurmountable technical reason for having two
> independent .def files these days. Indeed, it would seem highly
> desirable to me to merge them, with conditional parts as needed. It sure
> was a pain to be applying changes to both independently, when I was
> the one doing that.

'Yes'

> One of the reasons I was so happy to turn them over
> to you guys :).

Well the ideas originate from the team ... you'll see for expl3 we've
gone back to 'one definite source' as the number of drivers is nowadays
small and predictable.

> As I expect you know, they currently exist separately because of their
> historical development. xetex.def was based on dvipdfmx.def at the time
> of creating XeTeX. And that was reasonable during active XeTeX
> development. And so it has continued to the present day.  But nowadays,
> when dvipdfmx and xdvipdfmx themselves have been (sort of) merged
> (thanks always to Khaled ...), merging the .def files too seems good.

OK, I'll probably work on this but not before TL'17 release: somewhat
risky and not something I'd want to put on the DVD. (I will send the
latest update to CTAN to fix the issue concerning scaling of links.)

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [tex-live] xetex.def

2017-05-09 Thread Joseph Wright
On 09/05/2017 14:27, Joseph Wright wrote:
> So the question is what are the _essentials_ of the difference: really
> it's about 'how much difference do we need to look after in the .def files'.

I should add that the question arose as dvipdfmx.def and xetex.def
currently use different approaches to colour. The reasons are I think
historical: xdv2pdf didn't support the dvipdfmx approach, but xdvipdfmx
does. For maintenance *today* it would be clearer if we had a simple way
of knowing which parts have to be different between the dvipdfmx and
xetex drivers.

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [tex-live] xetex.def

2017-05-09 Thread Joseph Wright
On 09/05/2017 14:19, Akira Kakuto wrote:
> Dear Joseph,
> 
>> Following a bug report for (x)dvipdfmx box scaling, we are talking a
>> look at xetex.def and dvidpfmx.def to fix that and related issues. This
>> raises a question: what is the reason for having two .def files here. A
>> quick test suggests that XeTeX (xdvipdfmx) can happily use dvipdfmx.def
>> with the exception of a few lines at the end of the file: those could
>> easily be made conditional.
> 
> I'm not familiar with the drivers, but I think that the independent
> xetex.def is definitely needed.
> 
> I think that images png, jpg, pdf, are efficiently embedded in XeTeX,
> probably by using primitives, while dvipdfmx requires an external
> program extractbb to obtain sizes of the images.
> 
> For example,
> 
> %
> % xelatex test.tex(xetex.def)
> %
> \documentclass[12pt]{article}
> \usepackage{graphicx}
> \usepackage{pdfpages}
> \begin{document}
> \includepdf[pages={1-9}]{xtst.pdf}
> \end{document}
> 
> is far faster than
> 
> %
> % xelatex test.tex   (dvipdfmx.def)
> %
> \documentclass[12pt,dvipdfmx]{article}
> \usepackage{graphicx}
> \usepackage{pdfpages}
> \begin{document}
> \includepdf[pages={1-9}]{xtst.pdf}
> \end{document}
> 
> Best,
> Akira

Image inclusion is one I'd looked at, and certainly there is some
benefit from using the primitive for bounding-box lookup. However, that
doesn't mean that the entire .def files have to be different: for
example, we might pull them 'back together' in a .dtx and have only the
image inclusion bit varying. Other operations (scaling, rotation, colour
support, ...) seem to be addressable using common code, and indeed final
image inclusion (as opposed to BB extraction) could be done using a
common path for the shared data types (probably though as the BB lookup
is separate one should stick to the primitives in XeTeX).

So the question is what are the _essentials_ of the difference: really
it's about 'how much difference do we need to look after in the .def files'.

Joseph




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] xetex.def

2017-05-09 Thread Joseph Wright
Hello all,

Following a bug report for (x)dvipdfmx box scaling, we are talking a
look at xetex.def and dvidpfmx.def to fix that and related issues. This
raises a question: what is the reason for having two .def files here. A
quick test suggests that XeTeX (xdvipdfmx) can happily use dvipdfmx.def
with the exception of a few lines at the end of the file: those could
easily be made conditional.

Reading over the comments, I see some about the older non-xdvipdfmx
drivers for XeTeX, but these are as far as I know no longer in use
(particularly for anyone likely to use an updated .def file). Are there
any particular reasons that XeTeX needs a separate driver today?

Joseph


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] TeX--XeT and OpenType fonts

2017-03-01 Thread Joseph Wright
On 01/03/2017 12:14, Jonathan Kew wrote:
> On 01/03/2017 11:59, Joseph Wright wrote:
>> Hello all,
>>
>> With example
>>
>> \font\OTtenrm="[lmroman10-regular.otf]/OT"
>> \OTtenrm
>> \TeXXeTstate=1
>> \beginR
>> abc
>> \endR
>> \bye
>>
>> the output is LTR with TL'16. Is this a known issue?
> 
> Yes, this is expected behavior. The TeX--XeT direction controls (\beginR
> etc) control the ordering of words within a line, etc. (slightly more
> accurately, the direction in which nodes in an hlist progress), but do
> not override the inherent directionality of Unicode characters, so "abc"
> is still a sequence of three strong-LTR letters and they stay in their
> left-to-right order.
> 
> (However, if you try
> 
>   \beginR
>   abc def
>   \endR
> 
> I'd expect you to get output that reads "def abc" because the two words
> are ordered RTL, even though each of them remains LTR internally.)
> 
> This is why it is possible -- for better or worse -- to do something like
> 
> ...english text {\arabfont العربي} more english
> 
> in a xetex document and have the isolated Arabic word appear with
> correct (internal) RTL directionality, without having to explicitly
> surround it with \beginR...\endR (although for a multi-word Arabic
> phrase that would be necessary); the RTL-ness of the characters controls
> their behavior within the word, despite the TeX direction remaining LTR.
> 
> Currently, there isn't an option to make the TeX-level direction
> override the Unicode character directionality (comparable to the CSS
> property "unicode-bidi:bidi-override;"). Perhaps that would occasionally
> be useful, though people haven't exactly been clamouring for it AFAIK.
> 
> JK

Thanks: all clear.

My guess is for the rare 'override' case one would probably do something
at the macro level in any case (kerning is all wrong to start with).

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] TeX--XeT and OpenType fonts

2017-03-01 Thread Joseph Wright
On 01/03/2017 12:08, Joseph Wright wrote:
> On 01/03/2017 12:06, Philip Taylor wrote:
>> What happens if you replace the \endR with a blank line to cause the 
>> paragraph to end, Joseph [1] ?
>> Philip Taylor
> 
> Box ends on starting on the right but text itself is still LTR.

BTW, that's the lack of an explicit \endR not the \par: if you simply
force a paragraph nothing alters. Not that this impacts on the question ...

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] TeX--XeT and OpenType fonts

2017-03-01 Thread Joseph Wright
On 01/03/2017 12:06, Philip Taylor wrote:
> What happens if you replace the \endR with a blank line to cause the 
> paragraph to end, Joseph [1] ?
> Philip Taylor

Box ends on starting on the right but text itself is still LTR.

I suspect this is not normally noticed as my guess is HarfBuzz 'does its
own thing' with placing the glyphs based on their codepoint (and thus
'natural' LTR/RTL properties), and thus entirely ignores TeX--XeT.
However, it would be useful to know that is correct.

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] TeX--XeT and OpenType fonts

2017-03-01 Thread Joseph Wright
Hello all,

With example

\font\OTtenrm="[lmroman10-regular.otf]/OT"
\OTtenrm
\TeXXeTstate=1
\beginR
abc
\endR
\bye

the output is LTR with TL'16. Is this a known issue?

Joseph


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [tex-live] Random number primitives

2016-11-14 Thread Joseph Wright
On 14/11/2016 07:39, Akira Kakuto wrote:
> Dear Joseph,
> 
>> - \(pdf)uniformdeviate
>> - \(pdf)normaldeviate
>> - \(pdf)randomseed
>> - \(pdf)setrandomseed
>> (LuaTeX drops the 'pdf' part of the names.)
> 
> H. Kitagawa, the author of eptex, has added
> \pdfuniformdeviate
> \pdfnormaldeviate
> \pdfrandomseed
> \pdfsetrandomseed
> in eptex and euptex (r42506 in TL).
> 
> Best,
> Akira

Hi Akira,

Thanks for letting me know: we'll certainly add the random functionality
now!

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Random number primitives

2016-11-13 Thread Joseph Wright
On 13/11/2016 15:49, Karljürgen Feuerherm wrote:
> Well… does it have to be either/or, anyhow? Taking Apostolos’ last point, why 
> not have a switch?
> 
> It makes sense to allow for cross-platform compatibility, but there’s no 
> reason to think that *nobody* would appreciate an improved algorithm….
> 
> K

I'm not *against* an improved approach, but the work needed for 'a new
implementation in pdfTeX, LuaTeX, XeTeX, ...' is greater than 'add
more-or-less the current implementation from pdfTeX and LuaTeX to XeTeX,
...'.

Joseph




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Random number primitives

2016-11-13 Thread Joseph Wright
On 13/11/2016 13:04, Apostolos Syropoulos wrote:
>> to track the seed). The usefulness of pseudo-random numbers has come up
>> a few times recently, and so we'd like to address this. (Expandable
>> floating point evaluation is pretty handy as an end user!)
>  
> 
> Can you please elaborate on the usefulness of pseudo-random numbers in
> a typesetting engine? I think it is a good thing to add features but
> those features should be added for some good reason.

Indeed.

I've seen a variety of use cases for pseudo-random values, in particular
two which come up reasonably often. The first is selecting entries from
a larger 'pool' of values, for example for creating test papers. ('Use 5
out of the 20 questions I've written, picked at random'.) The second,
probably more common, case is in creating figures (for example using
TikZ). Depending on what one is representing, an element of
(pseudo)randomness is useful.

These use cases are beyond what might call the 'classical' idea of what
TeX is for, in the sense one could do them (and other uses) by hand or
using another tool and import into TeX. However, the programmable nature
of TeX attracts use in these ways. One can generate pseudo-random
numbers at the macro level, most obviously in the pgf package, and this
facility is used by many people. (My own use case for random values, in
creating some figures, uses the pgf implementation.) However, the need
to track the seed value to allow a pseudo-random sequence of values
means that any macro-based implementation is necessarily non-expandable.
The experience of the team with \fp_eval:n, the expandable FPU of the
expl3 bundle, suggests that expandable calculations are useful. In that
context, it would be nice to be able to offer some random value
abilities. (As noted at the start of this thread, that is already
possible in pdfTeX and LuaTeX and we will likely add something to the
FPU which currently will work with those engines only.)

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [tex-live] Random number primitives

2016-11-13 Thread Joseph Wright
On 12/11/2016 23:50, Nelson H. F. Beebe wrote:
> Joseph Wright <joseph.wri...@morningstar2.co.uk> writes on
> Sat, 12 Nov 2016 21:45:37 +:
> 
>>> Both pdfTeX and LuaTeX include a series of primitives that expose a
>>> lower-level pseudo-random number generator (I assume from C: there is
>>> very little actual code in the pdfTeX WEB source to implement these).
> 
> Since all *TeX engines on Unix(-like) systems these days are built on
> C code originally translated from the Pascal sources, it should be
> possible to supply an interface in all such engines to a C-library
> random-number generator.
> 
> Because the historical rand() is often platform-dependent, and poor,
> I'd recommend the POSIX drand48() family, which is available on all
> systems these days (and I can supply portable code if you feel the
> need for it).  It repairs some of the defects of 32-bit linear
> congruential generators, and should be able to deliver the same stream
> of random numbers from a given starting seed on all platforms.
> 
> It is imperative that users be able to supply an initial seed, because
> otherwise, it is not possible to generate independent streams of
> random numbers of successive runs, such as might be needed for
> multiple simulations.
> 
> However, the default seed should always be a constant, rather than one
> dependent on time, process-id, or other internal data; that way,
> successive runs are reproducible.  Unreproducible output may make
> debugging impossible.
> 
> If you want to improve the quality of the generator beyond what
> drand48() produces, contact me offlist for details of a simple
> extension that costs almost nothing extra, yet dramatically lengthens
> the period and reduces correlations between the output random numbers.

pdfTeX and LuaTeX *already have primitives* that generate random
numbers: I'm no C/WEB/... programmer but from pdftex.web I think it's
likely to be using rand() ultimately. Getting the same result from
pdfTeX and other engines on the same platform is what seems to me to be
important.

Note that the reason for asking about this is to allow pseudo-random
numbers for the type of thing that does make sense from inside a TeX
run. Examples are adding 'interest' in graphics, picking 'm from n
questions' in a list, etc. Being able to set the seed is useful (and
indeed implemented in pdfTeX/LuaTeX), but highly statistically
satisfying randomness is not. (I've never tested the pdf macro-based
implementation, but it's likely to be a balance between having some
'randomness' and being reasonable in TeX macros.)

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] Random number primitives

2016-11-12 Thread Joseph Wright
Hello all,

As many people will know, the LaTeX team have developed an expandable
FPU as part of expl3. That gets quite a bit of use, but one area we
can't currently address is random numbers. The pgf bundle has a
pseudo-random number generator, but that can't be expandable (you need
to track the seed). The usefulness of pseudo-random numbers has come up
a few times recently, and so we'd like to address this. (Expandable
floating point evaluation is pretty handy as an end user!)

Both pdfTeX and LuaTeX include a series of primitives that expose a
lower-level pseudo-random number generator (I assume from C: there is
very little actual code in the pdfTeX WEB source to implement these).
That gives us a way of providing expandable random numbers at the macro
level, but at present will be limited to those engines. As this is
something of an 'extra' (not core functionality), we will at present
accept that it's not doable in XeTeX/e-(u)pTeX (and issue an error if
requested), but it would be handy if the functionality could find its
way into those engines.

For reference, the 'full set' of primitives in this area is

- \(pdf)uniformdeviate
- \(pdf)normaldeviate
- \(pdf)randomseed
- \(pdf)setrandomseed

(LuaTeX drops the 'pdf' part of the names.)

Joseph


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] not enough \XeTeXcharclass registers

2016-02-01 Thread Joseph Wright
On 01/02/2016 09:00, Philip Taylor wrote:
> 
> 
> Akira Kakuto wrote:
> 
>> You can test the new experimental XeTeX on win32 by
>> http://members2.jcom.home.ne.jp/wt1357ak/xetex-exp-w32.zip
> 
> /Domo arigato gozaimasu/, Akira-san.  Downloaded and installed, just
> re-building formats before commencing testing.
> 
> Philip Taylor

Indeed: all working here, e.g.

\XeTeXcharclass6=16384 %

with

This is XeTeX, Version 3.14159265-2.6-0.3 (TeX Live 2016/W32TeX/dev)

Thanks Akira for this and the LuaTeX builds: very useful.

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] not enough \XeTeXcharclass registers

2016-02-01 Thread Joseph Wright
On 01/02/2016 09:37, Philip Taylor wrote:
> 
> 
> Akira Kakuto wrote:
> 
>>> Some interesting (and new, and unexpected) diagnostics, Akira-san; as
>>> far as I can tell, no PDF was produced :
>>
>> You have to replace "all" included binaries, by saving the old ones.
>> Note that size of xdvipdfmx.exe,  which is a wrapper of dvipdfmx.dll, 
>> is 1536 bytes.
> 
> Akira-san :  I did just that.  I copied the entire "...\bin\win32"
> directory to ...\bin\win32-old, then overwrote all files in
> ...\bin\win32 with the corresponding files from your ZIP file.
> 
> I will repeat the process just to ensure that I made no errors and then
> report back.
> 
> Philip Taylor

Works for me replacing xetex.exe, dvipdfmx.dll and adding icudt56.dll
(no present in stock TL2015). (System TL2015, updated this morning,
those files + luatex.dll from the W32TeX dev version.)

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] not enough \XeTeXcharclass registers

2016-01-31 Thread Joseph Wright
On 31/01/2016 18:07, Philip Taylor wrote:
> 
> 
> Jonathan Kew wrote:
> 
>> Before this gets merged to the master source, though, some testing would
>> be appreciated -- obviously, this will currently require rebuilding
>> xetex from the git source branch.
> 
> I use XeTeX on a daily basis, Jonathan ('tho XeTeXcharclass far less
> frequently) and would be happy to test your version on my production
> suites, but in order to do so I would require a Win64 (or Win32) build.
> 
> Philip Taylor

Hopefully Akira Kakuto will do that for W32TeX: he's done LuaTeX
v0.85/0.87/0.88 binaries.

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] not enough \XeTeXcharclass registers

2016-01-31 Thread Joseph Wright
On 31/01/2016 18:31, Philip Taylor wrote:
> Just as TeX has \maxdimen, it would be useful if derivatives of TeX such
> as XeTeX could add analogous environmental enquiries such as
> \maxXeTeXcharclass (or, less uglily but also less meaningfully,
> \XeTeXmaxcharclass).

\maxdimen isn't a primitive (though it's in the plain format).

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] not enough \XeTeXcharclass registers

2015-12-13 Thread Joseph Wright
On 13/12/2015 07:04, Werner LEMBERG wrote:
> 
> [XeTeX 3.14159265-2.6-0.2 (TeX Live 2015)]
> 
> 
> Folks,
> 
> 
> I'm updating the `ucharclasses.sty' to completely cover Unicode.  This
> style file maps Unicode character blocks to character classes, and
> I've hit the 256 entry limit of \XeTeXcharclass...
> 
> Any chance to extend it to 16 bits?
> 
> 
> Werner

I've been looking at Unicode classes recently :-) Exactly what
sub-division are you going for? There are several Unicode values that
seem to be important for 'full classification'.

Joseph



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] \(pdf)mdfivesum

2015-07-10 Thread Joseph Wright
On 10/07/2015 10:37, Akira Kakuto wrote:
 Dear Joseph,
 
 I have a request for a new primitive in XeTeX, not directly related to
 typesetting by I think useful. To understand why I'm asking, a bit of
 background would be useful.
 
 The XeTeX in the latest TeX Live repository has
 a new primitive \pdfmdfivesum imported from pdfTeX.
 However the name and the implementation itself, are
 still volatile.
 
 Best regards,
 Akira

Thanks: hope it was not too much effort.

I'll have to get on with what I was thinking it was useful for now!
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] \(pdf)mdfivesum

2015-07-01 Thread Joseph Wright
On 01/07/2015 19:39, Apostolos Syropoulos wrote:

 We can happily generate that file using pdfTeX (\pdfmdfivesum primitive)
 or LuaTeX (using Lua code), but not using XeTeX. That's not a big issue
 but the need for an MD5 sum gives me an idea which would need support in
 XeTeX.

 
 The (Xe)TeX language has been designed not for system programming and I 
 wonder why
 people would like to make it a system's programming language. A better idea 
 would be to
 use Perl or Python or even Ruby, which are widely available. 

TeX systems are essentially self-contained: on a Windows system with TeX
Live or MiKTeX installed one can nowadays assume Lua as well as TeX but
nothing else. Moreover, I'm thinking specifically of a use case linked
to the process of document preparation itself.
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] \(pdf)mdfivesum

2015-07-01 Thread Joseph Wright
Hello all,

I have a request for a new primitive in XeTeX, not directly related to
typesetting by I think useful. To understand why I'm asking, a bit of
background would be useful.

The LaTeX team have recently taken over looking after catcode/charcode
info for the Unicode engines from the previous rather diffuse situation.
As part of that, we were asked to ensure that the derived data was
traceable and so have included the MD5 sum of the source files in the
new unicode-letters.def file.

We can happily generate that file using pdfTeX (\pdfmdfivesum primitive)
or LuaTeX (using Lua code), but not using XeTeX. That's not a big issue
but the need for an MD5 sum gives me an idea which would need support in
XeTeX.

LaTeX offers \listfiles to help us track down package version issues but
this fails if files have been locally modified or don't have
date/version info. It would therefore be useful to have a system that
can ensure that files match, which is where MD5 sums come in. Once can
imagine arranging that every file \input (or \read) has the MD5 sum
calculated as part of document typesetting: this is not LaTeX-specific.
This data could then be available as an additional file listing to help
track problems. However, to be truly useful this would need to work with
all three major engines, and currently XeTeX is out. I'd therefore like
to ask that \pdfmdfivesum (or perhaps just \mdfivesum) is added to XeTeX.



There are a small number of other 'utility' primitives in pdfTeX/LuaTeX
(some in the latter as Lua code emulation) that might also be looked at
at the same time (see
http://chat.stackexchange.com/transcript/message/22496265#22496265):

 - \pdfcreationdate
 - \pdfescapestring
 - \pdfescapename
 - \pdfescapehex
 - \pdfunescapehex
 - \pdfuniformdeviate
 - \pdfnormaldeviate
 - \pdffilemoddate
 - \pdffilesize
 - \pdffiledump
 - \pdfrandomseed
 - \pdfsetrandomseed

most of which are not related to PDF output and which may have good use
cases. I am specifically *not* asking for any of these to be added here
but note this list as it *may* be that the work may be closely related.
--
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data

2015-05-07 Thread Joseph Wright
On 07/05/2015 10:56, Jonathan Kew wrote:
 On 7/5/15 09:34, Philip Taylor wrote:


 Apostolos Syropoulos wrote:

 The only mark that remains when making all capitals is the dieredis
 (dialytika). All other vanish. This is common knowledge for people who
 speak and write Greek.

 Well, this is not the opinion of (for example) Dr Charalambos Dendrinos,
 a native Greek speaker and Director of the Hellenic Institute.  This is
 why I asked whether it was a universally-agreed truism or simply a
 matter of opinion, and in view of the fact that both Dr Dendrinos (in
 private correspondence) and Julian Bradfield (on this list) have offered
 the alternative perspective to your own, it would seem to be a matter of
 opinion rather than one of fact.  If you look at the opening folio of
 George Etheridge's Encomium on Henry VIII, addressed to Elizabeth I :

 
 http://hellenic-institute.rhul.ac.uk/research/Etheridge/Electronic-Edition/


 you will see a number of Greek majuscules with either psilí or daseîa,
 including the very combination under discussion (GREEK CAPITAL LETTER
 EPSILON WITH PSILI, on line 2), suggesting that the combination of
 breathing and majuscule was common at that time.
 
 I think there may be some confusion as to exactly what this discussion
 is about. Certainly, the combination of breathing and majuscule occurs
 in mixed-case polytonic text, as shown in your example. However,
 Apostolos is (I think) addressing the case of all-uppercase text, in
 which case the usual practice is to drop all marks except dieresis.
 
 See, for example, http://unicode.org/udhr/d/udhr_ell_polytonic.html;
 note the presence of breathing marks on initial capitals within the
 text, but note also their complete absence in the ALL-CAPS title.
 
 So if a lower-to-uppercase mapping is used just to Capitalize Initial
 Letters, it perhaps should not discard breathing marks; but if it is
 used to turn a passage of text into ALL UPPERCASE, then it probably
 should discard them.
 
 But things are actually trickier than that. AIUI, the most correct
 polytonic UPPERCASE transform for μάιος would be ΜΑΪΟΣ -- not only
 is the accent on ά gone, but ι has acquired a dieresis and become Ϊ.
 
 The \uccode/\lccode tables in (Xe)TeX cannot fully capture this, no
 matter what code assignments are chosen; neither can the per-character
 properties in Unicode. It requires a more powerful approach to case
 transforms.
 
 So I still maintain that the default code values assigned in formats
 such as xe(la)tex should be based directly on the Unicode properties. It
 would be great to have a Greek package that implements proper Greek
 uppercasing, but this level of language- and orthography-specific
 behavior does not belong in the base format.

Indeed, whilst not what I was after here (which as you say is about
defaults for the formats), in the expl3 code I've written for case
changing the idea of positional dependence is built it. There's no
question that the TeX 1-1 mapping for case changing is not applicable to
many situations, not just the case of Greek text. I'll ask a separate
question about Greek case mapping for the expl3 context later on as it
seems to have people's attention.
--
Joseph Wright




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Case changing for Greek

2015-05-07 Thread Joseph Wright
On 07/05/2015 13:40, Philip Taylor wrote:
 
 
 Joseph Wright wrote:
 
 For performance reasons that code has been set up to assume that a
 sigma is final if it is followed by a space, a control sequence or a
 character from the list

 ) ] } . : ; , ! ? ' 
 
 The inclusion of a control sequence worries me; may I ask why you do
 not propose to expand the control sequence (if expandable) or ascertain
 its equivalence (if \let, for example) in order to predicate the
 assessment as to whether or not the sigma is final on the expansion /
 equivalence of the control sequences as would seem at first sight to be
 required.

As the code here is for expl3, there is an assumption that such input is
either fully-expandable or engine protected. As such, application of
\edef in a preceding step will do the same without having to put in the
rather complex loops one needs otherwise (the code is expandable so
cannot itself include assignments). Note also that the code we have is
explicit intended for 'text': it seems unlikely that real user text will
intermix stored characters and literal ones and indeed defining the
logic either way could be questionable.

If it becomes clear that the current approach does not work then
alternatives can be considered. We don't have much in the way of
use-cases at present beyond ones we've thought of ourselves.
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Case changing for Greek

2015-05-07 Thread Joseph Wright
On 07/05/2015 14:23, Nikos Platis wrote:
 2015-05-07 15:22 GMT+03:00 Joseph Wright joseph.wri...@morningstar2.co.uk:
 
 For performance reasons that code has been set up to assume that a
 sigma is final if it is followed by a space, a control sequence or a
 character from the list

 ) ] } . : ; , ! ? ' 


 I would add to this list the dashes, anoteleia, the greek closing quote
 ».
 On the contrary, the english question mark ? would not belong to a greek
 text.

Thanks for the additions. Per Unicode, the final sigma rule applies to
all text using Greek chars, not just text in Greek, so a sentence in
English finishing with a Greek word should presumably apply the rule to
that word, hence having ? in my list (I am aware that Greek uses ;
to indicate a question).
--
Joseph Wright




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Case changing for Greek

2015-05-07 Thread Joseph Wright
On 07/05/2015 14:26, Jonathan Kew wrote:
 FWIW, we've done some work on this in Mozilla in the past few years, to
 provide language-appropriate behavior for CSS features like
 text-transform:uppercase and font-variant:small-caps. You might like to
 review the discussion in bug reports such as
 
   https://bugzilla.mozilla.org/show_bug.cgi?id=231162
   https://bugzilla.mozilla.org/show_bug.cgi?id=307039
   https://bugzilla.mozilla.org/show_bug.cgi?id=740120
   https://bugzilla.mozilla.org/show_bug.cgi?id=740477
 
 In particular, bug 307039 has a lot to say about uppercasing Greek. The
 details of actual code patches will obviously not be relevant, but the
 comments describing desired/implemented behavior may be helpful.

Thanks for that: looks useful. Will need to read properly and digest.

I am left wondering why this is not addressed in SpecialCasing.txt!
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Case changing for Greek

2015-05-07 Thread Joseph Wright
On 07/05/2015 15:02, Jonathan Kew wrote:
 On 7/5/15 13:22, Joseph Wright wrote:
 Included in that 'standard' set up is the final sigma rule for Greek
 text. For performance reasons that code has been set up to assume that a
 sigma is final if it is followed by a space, a control sequence or a
 character from the list

  ) ] } . : ; , ! ? ' 
 
 Would it be feasible to define this negatively instead -- something like
 a sigma is final if it is NOT followed by another letter?

Possibly yes: I guess in the TeX context a catcode-based test would work
reasonably well. I'll explore that.

 A possible refinement is that a lone sigma, neither preceded nor
 followed by another letter, should probably be lowercased as σ rather
 than ς.

One that needs input from a Greek speaker!

 To see the result of what we implemented for Firefox, you can try
 loading a testcase such as
 
   data:text/html;charset=utf-8,
 p style=text-transform:lowercaseΣΑΒ ΑΣΒ ΑΒΣ Σ ΣΣΣ (Σ)
 
 in the browser, which displays it as σαβ ασβ αβς σ σσς (σ). (And I
 notice Chrome and Safari have the same behavior, too.)

Much the same as we do.
--
Joseph Wright




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] Case changing for Greek

2015-05-07 Thread Joseph Wright
Hello all,

The question of case changing in Greek has come up in another thread.
Whilst the details here aren't XeTeX (or even TeX) specific, given the
interest by members of the list I hope I can take advantage to ask about
the area.

For work on LaTeX3/expl3 we've put together an approach to case changing
in XeTeX (and LuaTeX) that is not tied to a 1-1 mapping.

One of the design ideas behind the code was to allow a way to tackle
context- and language-dependent changes. At the same time, to date we
have used the Unicode docs to define case mappings. Thus the 'standard'
mappings follow those in UnicodeData.txt (1-1 lower/title/upper) and
SpecialCasing.txt (more complex cases).

Included in that 'standard' set up is the final sigma rule for Greek
text. For performance reasons that code has been set up to assume that a
sigma is final if it is followed by a space, a control sequence or a
character from the list

) ] } . : ; , ! ? ' 

Other potential additions are welcome as is testing of what we have
done. (There seem to be a lot of edge cases. For example, what happens
if a sigma is immediately followed by a number, say in a computational
identifier.)

What has not been covered at all to date is any special handling of
accents. As indicated in the other thread, it seems that the handling of
accents in Greek is non-trivial. Notable, we have an implementation
which separates out title case from upper case and have the idea of
language-dependent mappings. Thus it would be perfectly possible to have
logic 'Retain accents on the first letter of a word when title casing;
remove them when upper casing'. Similarly, I wonder if there are
differences in practice related to the nature of the text: modern
writing vs. historical text, etc. Again, this can be added if there is a
clear set of rules to follow.

Detailed information is most welcome.
--
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Joseph Wright
Hello all,

As some people will have seen, the LaTeX team have recently integrated
setting of codes (\catcode, \lccode, etc.) for the entire Unicode range
 into the kernel when XeTeX/LuaTeX are in use. This is not a functional
change for end users but does mean that the team now have some control
over these important settings. Notably, the new data file we have
created (unicode-letters.def) is compatible with plain TeX and works
with both XeTeX and LuaTeX. We are therefore hopeful that it will
provide useful not only to LaTeX users but also to those using
plain-basef formats.

For the initial pass we have adopted the settings applied by
unicode-letters.tex (XeTeX)/luatex-unicode-letters.tex (LuaTeX) as-is.
We have constructed a new (TeX) script to generate this data from the
raw Unicode data files.

Most of the settings are straight-forward and shared between XeTeX and
LuaTeX. For example, characters marked as Unicode as letters have
\catcode 11, \lccode and \uccode are set up based on case relationships,
etc. However, we would like to raise one area that may need revision.

Based on the current files, we have a block to set \XeTeXcharclass,
which only applies to XeTeX. The logic followed in that code is that
characters in the file LineBreak.txt which have class ID (ideographs)
not only set the \XeTeXcharclass class to 1 but also set the \catcode of
the code point to 11. That leads to a difference between the two Unicode
engines. My current feeling is that the data file should split this
process such that the category code change applies to both XeTeX and
LuaTeX, with the XeTeX-specific code separate. Does this make sense and
indeed does the current assignment make sense?

We are very keen to hear about any other logic changes that may be
required in the data file. This is a complex area and we have at present
done little other than copy the current logic.
--
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Joseph Wright
On 06/05/2015 21:06, David Carlisle wrote:
 On 6 May 2015 at 20:15, Philip Taylor p.tay...@rhul.ac.uk wrote:


 Apostolos Syropoulos wrote:

 It seems to me that most people have no idea what Unicode is and what is 
 really
 involved.

 OK, so if we restrict the Universe of Discourse to the set of native
 Hellenic speakers who know what Unicode is, know the importance of being
 able to use it to identify the correct upper case of (for example)
 'GREEK SMALL LETTER EPSILON WITH PSILI', and hold an informed opinion on
 the matter, would you expect that 100% of these would agree that the
 uppercase is 'GREEK LETTER EPSILON' and not 'GREEK LETTER EPSILON WITH
 PSILI', or would you expect that some percentage (perhaps small) would
 hold the opposite point of view ?

 ** Phil.

 
 I don't think that's the right question. Even if everyone, including
 the Unicode technical committee,
 agreed some properties are incorrect for some characters, it isn't
 clear we should change
 them at this level.
 
 I think that unicode-letters.def makes most sense as a
 fully automated representation of the UCD data files in TeX syntax.
 
 That way everyone knows what data is in there.
 
 Individual language packages have far fewer characters to worry about
 and can over-ride
 the base settings where appropriate.

Indeed: provided hyphenation is correct then we are OK. (LuaTeX of
course is rather more flexible there than XeTeX.)
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Joseph Wright
On 06/05/2015 15:09, Jonathan Kew wrote:
 On 6/5/15 14:14, Joseph Wright wrote:
 
 Based on the current files, we have a block to set \XeTeXcharclass,
 which only applies to XeTeX. The logic followed in that code is that
 characters in the file LineBreak.txt which have class ID (ideographs)
 not only set the \XeTeXcharclass class to 1 but also set the \catcode of
 the code point to 11. That leads to a difference between the two Unicode
 engines. My current feeling is that the data file should split this
 process such that the category code change applies to both XeTeX and
 LuaTeX, with the XeTeX-specific code separate. Does this make sense and
 indeed does the current assignment make sense?

 
 ISTM that the most appropriate (default) \catcode for characters with
 class ID is clearly letter (11), and would suggest that LuaTeX should
 follow XeTeX in this.

Well for LaTeX at least the team get to make the call here and I think
we will pull everything into line.

 So yes, splitting out the XeTeX-specific code and having LuaTeX share
 the catcode assignments makes sense.

OK, if there are no objections I have a plan on this (I'll actually keep
all of the data, I think, and alter the assignment code).

 After all, if users can write control sequences such as
 
   \hello
   \halló
   \Здравствуйте
   \ሰላም
   \सलाम
 
 they should equally well be able to write
 
   \你好
   \こんにちわ
 
 and have each of these treated as single control sequences, too. This
 will not work if category ID characters are given catcode 12.

Entirely reasonable.

 If you're making improvements to unicode-letters.def, I would suggest
 also adding a section that assigns catcode 15 (invalid) to the code
 values D800 - DFFF (i.e. the UTF-16 surrogates, which should never be
 used in isolation as characters).

Noted: easy enough to add.
--
Joseph Wright




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Joseph Wright
On 06/05/2015 16:04, Apostolos Syropoulos wrote:
 Hello,
 
 I checked a bit the file and I have noticed that 
 
 
 \L 1F10 1F18 1F10 % 
 
 while xgreek.sty defines 
 
 
 \global\lccode1F10=1F10 \global\uccode1F10=0395
 
 You see the uppercase of 'GREEK SMALL LETTER EPSILON WITH PSILI'
 is 'GREEK LETTER EPSILON' and not 'GREEK LETTER EPSILON WITH PSILI. 
 
 Some time ago I reported this to the Unicode people and they told me 
 
 something like we cannot change it now (I do not remember the exact 
 
 wording but the essence remains the same.) Naturally, all \lccodes and
 \uccodes for Greek letters are wrong and I suspect many more are wrong. 

This is slightly at a tangent from my original question (whether we are
processing the Unicode data in the right way), but is worth
consideration. It also has some impact on expl3 code related to case
changing (which does not use \lccode/\uccode).

I guess one could imagine deviating from the Unicode data but there are
issues. First, the current position is at least easy to explain. Second,
the current approach is the same position taken by I guess many other
pieces of software, so is cross-compatible with other stuff. Third, as a
non-Greek I can't comment on the technical correctness of what you say!
Is there some place I could see this discussed in detail? (I'm a bit
confused as to what 'GREEK CAPITAL LETTER EPSILON WITH PSILI' represents
if it's not the upper case of 'GREEK SMALL LETTER EPSILON WITH PSILI': I
notice in xgreek you map U+1F18 to U+0395 for upper casing and U+1F10
for lower casing.)
--
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX maintenance

2015-04-28 Thread Joseph Wright
On 28/04/2015 00:48, Douglas McKenna wrote:
 That isn't at all clear, I don't see any evidence that the equivalent 
 #; notation in XML is getting less used. For runs of natural language 
 text then clearly using character data directly makes more sense
  but to get specific symbols accessing by code point often makes sense. 

 To get a math bold A, It is much easier to tell someone to enter ^1d400 
 than to tell them how
 to enter 퐀 in whatever system they are using.
 
 Of course, every Unicode reference on the web would be referring to it as 
 U+1D4000.  Sigh.
 
 Anyway, duly noted.  Except in the future, whatever system they are using 
 will likely be able to handle the UTF-8 character as direct input, just like 
 it appears (in my email reader) above as a math bold A.

Well yes and no. Whilst editors, viewers, etc. can be expanded to cover
the entire Unicode range, no one font will cover the entire spectrum. At
the same time, most keyboards are only ever going to have ~100 keys. So
whilst for the main language of a document Unicode makes sense, saying
that you have to find the correct Unicode code point for everything is
not so convenient. (One can arrange different key binds to flip between
which could be used to do for example the math-mode bold A business, but
that may or may not be easier than just typing \mathbf{A} or whatever,
depending on the use case.)
--
Joseph Wright




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX maintenance

2015-04-27 Thread Joseph Wright
On 27/04/2015 00:22, Ross Moore wrote:
 But of course that doesn't address the problem for LaTeXt users until
 someone writes a suitable/comparable package (maybe someone did
 already, I didn't try to follow).
 
 I have coding for much of what is needed, using the modified pdfTeX.
 But there is a lot that still needs to be added; e.g. PDF’s table model,
 References, footnotes, etc. 

Somewhat away from the original topic, but it strikes me that building a
tagged PDF is going to be much more problematic at the macro layer than
at the engine level: is that fair? Deciding what elements of a document
are 'structure' is hard, and in 'real' documents it's not unusual to see
a lot of input that's more about appearance than structure. That of
course isn't limited to TeX: I suspect anyone trying to generate tagged
output has the same concern (users do odd things).
--
Joseph Wright




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX maintenance

2015-04-27 Thread Joseph Wright
On 27/04/2015 08:43, Ross Moore wrote:
 Hi Joseph,
 
 On 27/04/2015, at 4:19 PM, Joseph Wright wrote:
 
 On 27/04/2015 00:22, Ross Moore wrote:
 But of course that doesn't address the problem for LaTeXt users until
 someone writes a suitable/comparable package (maybe someone did
 already, I didn't try to follow).

 I have coding for much of what is needed, using the modified pdfTeX.
 But there is a lot that still needs to be added; e.g. PDF’s table model,
 References, footnotes, etc.

 Somewhat away from the original topic, but it strikes me that building a
 tagged PDF is going to be much more problematic at the macro layer than
 at the engine level: is that fair? 
 
 Certainly one needs help at the engine level, to build the tree
 structures: what is a parent/child of what else.

Yes, I didn't mean that engine support isn't required, but that some of
the more complex concepts are probably at the macro layer. You know a
lot more about this than I do, but I assume that there is more to tagged
PDFs than sectioning (which is relatively easy to define). For example,
as a chemist I'd guess one has to worry about chemical formulae and
about reference numbers to compounds. (We tend to give the latter in
bold and they commonly refer to graphics representing the structures.
That looks very tricky to me to express in a tagged form!)

 But macros are needed to determine where new structure starts
 and finishes.
 Think  \section  and friends, list environments, \item  etc.

Yes, those elements seem relatively clear. As I've noted in another
reply, ConTeXt MkIV has moved to a more XML-like \startitem ...
\stopitem construct as the preferred way to deal with (here) items, I
guess in part as that makes such things easier. As a user that's
slightly more tricky: I'd say that the ideal that one item is ended by
the start of the next is pretty clear :-)

 Indicators must go in at a high level, before these are decomposed
 into the content:  letters, font-switches, etc.

Again, understood and I think reasonably clear for the macro level.

 In short, determining where structure is to be found is *much* harder
 at the engine level; but doing the book-keeping to preserve that
 structure, once known, is definitely easier when done at that level.

Makes sense. One can imagine constructing a tree at the macro level but
as there still needs to be some tagging I guess it doesn't help. (Can
the latter be done using \specials?)

 Philip Taylor is correct in thinking that such things can be
 better controlled in XML. But there the author has to put in
 the extra verbose markup for themselves --- hopefully with help
 from some kind of interface.
 However, that can involve a pretty steep learning curve anyway.

 Word has had styles for decades, but how many authors actually
 make proper use of them?  e.g. linking one style to another,
 setting space before  after, rather than just using newlines,
 and inserting space runs instead of setting tabs.
 How many even know of the difference between return  and
 Shift-return  (or is it Option-return ) ?

:-)

 The point of (La)TeX is surely to allow the human author
 to not worry too much about detailed structure, but still allow
 sufficient hints (via the choice of environments and macros used)
 that most things should be able to be worked out.

That's the plan, I guess.
--
Joseph Wright




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX maintenance

2015-04-27 Thread Joseph Wright
On 27/04/2015 07:35, Philip Taylor wrote:
 Going even further off-topic, but pursuing this one aspect of the
 thread, is there not only real one problem :  the need to educate users
 to cease marking up their documents in raw (La)TeX syntax, and instead
 to express them in well-formed XML ?  I have just finished typesetting
 (using [plain] XeTeX) a 544pp book marked up entirely in XML, and whilst
 I have made no efforts to generate PDF/UA, I am convinced that the task
 of so doing (assuming that the necessary primitives are or were
 available in XeTeX) would have been 1/1000 of the effort needed to do so
 had the book been marked up in traditional (La)TeX syntax with its usual
 accompanying conflation of form and content.

As Ross says in a parallel message, XML raises different issues and is
not a panacea. For a start, we can ask if XML is a particularly good
format not only here or for anything (there's a blog post by Linus
Torvalds suggesting the answer is 'no'!). Assuming XML is at some level
a good plan, that still doesn't make it a good plan for the end user nor
ensure that the end sure will stick to logical structures. There's also
the business that TeX is useful because sometimes we do need some visual
adjustment or programming element.

LaTeX2e is already not bad for structure if used in the right way, and
ConTeXt MkIV has gone further along an XML-like road without using this
as the native syntax (\startsection/\stopsection for example), and of
course plain users can define similar structures (indeed without the
constraints that LaTeX has of needing not to break things).
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX maintenance

2015-04-27 Thread Joseph Wright
On 27/04/2015 01:05, Douglas McKenna wrote:
 Joseph Wright wrote:
 
 \def\{0}\expandafter\def\csname^00022\endcsname{1}
 \ifnum\=0 \message{tex82}\else\message{newstuff}\fi
 
 When I implemented a Unicode escape sequence extension using double-caret 
 notation in the JSBox TeX-language interpreter I've been working on (which is 
 all 21-bit Unicode internally, all the time, but can be configured at 
 run-time to be 8-bit input only), I was unaware of what XeTeX had 
 implemented, so I just used
 
 ^^u (for 16-bit, BMP codes)
 ^^Uxx (for all 21-bit Unicode code points)
 
 Seemed straightforward enough.

XeTeX conventions have been picked up by LuaTeX on this, and there's
been some 'feedback' from LuaTeX to XeTeX to give us some
standardisation for Unicode primitives/syntax (admittedly with bugs, but
that's a different point). I'd hope that any future Unicode TeX-like
systems would also pick up on the model used by XeTeX/LuaTeX.

 Given that the number of TeX input files using ^^u is likely miniscule, and 
 the number of those that follow the ^^u or ^^U with four or six hex digits is 
 even smaller, it seemed like a worthwhile benefit vs. cost, 
 compatibility-wise.  Maybe there's something I've not thought out well.

I didn't mean that there would be many real-world docs with this issue.
I was trying to point out that it's almost impossible to imagine that a
Unicode TeX-like engine could be used as a drop-in replacement for the
current 8-bit ones (pdfTeX most obviously), so when we talk about 'the
future' we have to mean 'for documents written assuming Unicode' rather
than 'for all existing TeX documents'. (For mathematicians the latter
point is very important.)

 This discussion I just found is both pertinent and frightening, I suppose:
 
 http://stackroulette.com/tex/62725/the-notation-in-various-engines

That's a (questionable) reuse of the info from

http://tex.stackexchange.com/questions/62725/the-notation-in-various-engines

Note that the discussion is editable (wiki-like) and to my knowledge is
still correct as-is. There are some tricky issues in XeTeX, particularly
related to non-BMP chars, partly because working out what should happen
here has been a work-in-progress.
--
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX maintenance

2015-04-26 Thread Joseph Wright
On 26/04/2015 11:47, Philip Taylor wrote:
 To my mind, XeTeX /is/ the future of TeX.  The days of entering
 français as fran\c cais are surely numbered, and it has never been
 possible to enter العربية, ελληνικά or עברית (etc) in an analogous
 way.  Therefore, is it not time to petition the TUG Board to adopt XeTeX
 as a formal TUG project, and to allocate adequate funding to ensure not
 only its continued existence but its continued development, at least
 until such time as a clearly superior alternative not only emerges but
 becomes adopted as the /de facto/ replacement for TeX ?
 
 Philip Taylor

The problem as always is not so much money as people. [Also, you do know
about LuaTeX, yes? ;-) More seriously, XeTeX isn't a drop-in replacement
for TeX90/pdfTeX.]
--
Joseph Wright




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX maintenance

2015-04-26 Thread Joseph Wright
On 26/04/2015 12:16, Philip Taylor wrote:
 
 
 Joseph Wright wrote:
 
 See for example details in
 http://tex.stackexchange.com/questions/86/what-are-the-incompatibilities-of-pdftex-xetex-and-luatex
 for places where there are edge cases. The most obvious would be that
 XeTeX requires the xdvipdfmx back-end (so differences at the \special
 level), 
 
 Yes, I accept that, but to the user (as I have argued elsewhere), XeTeX
 subsumes 'xdvipdfmx' -- the fact that they are, historically, two
 separate pieces of software and are separately maintained is a sad fact
 of life but not one that the user of XeTeX should be required to consider.

Still requires changes in a document, particularly one written for
pdfTeX in PDF mode (certainly for plain: for LaTeX of course this is
more transparent).

 but a simple piece of code

 \def\{0}\expandafter\def\csname^00022\endcsname{1}
 \ifnum\=0 \message{tex82}\else\message{newstuff}\fi

 (ConTeXt wiki) gives different results with TeX90 and XeTeX due to
 different treatment of more than two ^^ (catcode 7) in a row.
 
 OK, agreed: by adding support for wider characters, some breakages will,
 almost of necessity occur, but I would respectfully argue that these are
 pathological cases that will not impact real-world documents.

My point though is that neither XeTeX nor indeed any other Unicode
TeX-like engine can be used as a direct replacement for an 8-bit engine:
contrast the fact that the standard engine for TeX Live is nowadays
pdfTeX used as a direct drop-in replacement for TeX90 (with the
exception of using tex, which is Kunth's TeX unaltered). As such,
whilst new documents may be written using a Unicode engine, pdfTeX will
remain vital.

All that said, I am keen that some way is found to continue to work on
XeTeX. The problem is that WEB code is *hard* to work with!
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX maintenance

2015-04-26 Thread Joseph Wright
On 26/04/2015 12:00, Philip Taylor wrote:
 
 
 Joseph Wright wrote:
 
 The problem as always is not so much money as people.
 
 Yes, I do appreciate that, but sometimes money is also an obstacle (do
 I work on X, which will help keep a roof over my head, or on XeTeX,
 which may bring me fame but which may also result in my eviction ?).
 
 [Also, you do know about LuaTeX, yes? ;-)
 
 Yes, of course, but I see it as an evolutionary dead-end, much as I
 would wish to see it otherwise.
 
 More seriously, XeTeX isn't a drop-in replacement for TeX90/pdfTeX.]
 
 I have yet to find a legacy document which behaves differently (legacy
 Plain TeX, that is, not legacy LaTeX); if you can point me at one, I
 should be interested to experience the differences for myself.
 
 ** Phil.

See for example details in
http://tex.stackexchange.com/questions/86/what-are-the-incompatibilities-of-pdftex-xetex-and-luatex
for places where there are edge cases. The most obvious would be that
XeTeX requires the xdvipdfmx back-end (so differences at the \special
level), but a simple piece of code

\def\{0}\expandafter\def\csname^00022\endcsname{1}
\ifnum\=0 \message{tex82}\else\message{newstuff}\fi

(ConTeXt wiki) gives different results with TeX90 and XeTeX due to
different treatment of more than two ^^ (catcode 7) in a row.
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] printing of characters above FFFF with \string \meaning (and potentially \Uchar)

2015-04-23 Thread Joseph Wright
On 23/04/2015 14:07, David Carlisle wrote:
 Last year I asked about the possibility of adding \Uchar copied from luatex.
 
 http://tug.org/pipermail/xetex/2014-May/025260.html
 
 Bruno suggested a possible implementation, and I finally got round to
 trying that
 adjusted for the sources as in the texlive 2015 pretest tree (diff attached)
 
 This seems to work fine for characters below 
 but fails for non BMP characters above that.
 
 See the attached xetexuchar.tex file and the log produced by
 luatex and (patched) xetex.
 
 It just uses the same print_char routine as \string so I thought I'd test
 that.
 See the file nonbmp.tex (which can be used with a non-patched xetex)
 
 As can be seen with the attached logs this works with luatex with
 \string on U+1D538 producing a single character, but with xetex it produces
 two (presumably the UTF-16 surrogate pair, although I didn't check that).
 
 Is my reading of this file correct and \string and meaning are turning
 U+1D538  into two characters, and if so does anyone have a suggestion
 of the best place this should be attacked in the source?
 
 
 David

Obviously the non-BMP issue needs to be tackled, but I wonder if \Uchar
could be added in any case. It would bring functionality in this area
closer to LuaTeX and presumably the high chars business can be viewed as
a separate issue.
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Bug with color \specials?

2014-06-24 Thread Joseph Wright
On 24/06/2014 14:18, Ulrike Fischer wrote:
 Am Tue, 24 Jun 2014 21:53:26 +0900 schrieb Akira Kakuto:
 
 Thus problems are in pgfsys-dvipdfmx.def
 and pgfsys-xetex.def.
 
 I can avoid the problem by redefining this two commands of a current
 pgfsys-dvipdfmx.def:

[snip]

 But the binary (x)dvipdfmx must be involved somehow too: Even if I
 use all the files from texlive2014 with miktex I don't get the error
 there. 

Which files are we talking about here: just pgf-related ones or also
xetex.def/dvipdfmx.def/dvipdfmx.cfg/...?
-- 
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX/xdvipdfmx or the driver bug with eps images

2014-05-28 Thread Joseph Wright
On 28/05/2014 05:24, Vafa Karen-Pahlav wrote:
 Hi list
 
 It seems that xelatex has problems including eps images; please see the
 attached example and the provided eps image.
 
 latex+dvips+pstopdf: image is included inside the \fbox
 
 xetex recent versions: image is outside of \fbox
 
 xetex old versions: (the one coded by Jonathan Kew): ok, image is included
 inside \fbox.
 
 w.eps is taken from LaTeX graphics companion examples; therefore I do not
 think there is anything wrong with the image itself.
 
 What is wrong?
 
 I also have experienced some strange problems with recent versions of
 xetex; include an image in a document on Windows and the result is
 perfectly fine but you try to compile the same document on a different
 operating system, then images are placed strangely (i.e. the image width
 exceeds the textwidth and is placed on the right or left hand side). This
 issue is very annoying and existed for few years now. I try to send some
 minimal example for this later today.

Initial analysis: http://tex.stackexchange.com/questions/180766. It
looks to me like this is not XeTeX-specific but is linked to a change in
(x)divpdfmx and related config files:
http://tug.org/svn/texlive?view=revisionrevision=30175 (versions of
xetex.def before this do not give the issue).

As noted by Herbert Voss, w.eps is really a PS file. However, it's
treated differently by dvips than (x)dvipdfmx, and that's not expected!
-- 
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeTeX/xdvipdfmx or the driver bug with eps images

2014-05-28 Thread Joseph Wright
On 28/05/2014 16:14, Akira Kakuto wrote:
 Dear Vafa Karen-Pahlav
 
 w.eps is taken from LaTeX graphics companion examples;
 therefore I do not think there is anything wrong with the image itself.

 What is wrong?
 
 It is sufficient to change the header of w.eps
 from
 %!PS-Adobe-2.0
 to
 %!PS-Adobe-2.0 EPSF-2.0
 in order to tell Ghostscript that w.eps is an
 eps file.
 
 Please try, then you will obtain an expected pdf.
 
 Thanks,
 Akira

All true, but both latex + dvips and  pdflatex produce the expected
output, as do latex + dvipdfmx or xelatex with the older driver set up.
-- 
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] ^^J in the plain XeTeX format

2014-05-05 Thread Joseph Wright
Hello all,

Doing some experiments on writing to the log, I find that the XeTeX
format shows different behaviour from other formats with respect to ^^J.
Trying

\immediate\write-1{Hello^^Jworld}
\bye

with pdfTeX or LuaTeX gives two lines in the log

Hello
world

but with XeTeX gives

Hello^^Jworld

LaTeX and ConTeXt (MkII) show identical behaviour for XeTeX and pdfTeX
(and LuaTeX in the LaTeX case).

Anyone know why this is, and if it's deliberate?
-- 
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] Extended ^^ notation and \scantokens

2014-05-05 Thread Joseph Wright
Hello all,

Experimenting with \scantokens for generating characters from the
charcodes, the following issue comes up in XeTeX. For the test file

\show ^10400
\show ^^010400
\def\gobble#1{}
\showtokens\expandafter{%
  \romannumeral-`\q\expandafter\expandafter\expandafter\gobble
\expandafter\string\csname
\scantokens{^10400\noexpand}\endcsname%
}
\showtokens\expandafter{%
  \romannumeral-`\q\expandafter\expandafter\expandafter\gobble
\expandafter\string\csname
\scantokens{^^010400\noexpand}\endcsname%
}

the \show statements work fine but the \scantokens versions don't. This
is not limited to the rather odd setup above (used so \showtokens is
applicable): with \everyeof{\noexpand} and a suitable set of \write
statements you see the same in a temporary file.

It seems that something is up once you get to five hexadecimal digits:
bug in XeTeX?
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] ^^J in the plain XeTeX format

2014-05-05 Thread Joseph Wright
On 05/05/2014 08:05, Joseph Wright wrote:
 Hello all,
 
 Doing some experiments on writing to the log, I find that the XeTeX
 format shows different behaviour from other formats with respect to ^^J.
 Trying
 
 \immediate\write-1{Hello^^Jworld}
 \bye
 
 with pdfTeX or LuaTeX gives two lines in the log
 
 Hello
 world
 
 but with XeTeX gives
 
 Hello^^Jworld
 
 LaTeX and ConTeXt (MkII) show identical behaviour for XeTeX and pdfTeX
 (and LuaTeX in the LaTeX case).
 
 Anyone know why this is, and if it's deliberate?

Further to this, I note that the plain format doesn't set \newlinechar
^^J, so Knuth's TeX gives the same behaviour as XeTeX. Arguably
therefore an issue with pdfTeX/LuaTeX: will raise in appropriate place(s).
-- 
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Extended ^^ notation and \scantokens

2014-05-05 Thread Joseph Wright
On 05/05/2014 10:55, Qing Lee wrote:
 It seems to be related to the following tickets:
 
 http://sourceforge.net/p/xetex/bugs/79/
 http://sourceforge.net/p/xetex/bugs/80/
 http://sourceforge.net/p/xetex/bugs/88/
 
 Qing Lee

OK, so issue(s) are known: I'd been trying to get a short demo together
for that last one!

Guess for the moment I'll just have to put up!
-- 
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] The arcs package

2013-08-25 Thread Joseph Wright
On 25/08/2013 15:42, Arash Zeini wrote:
 Hello,
 
 Since the upgrade to TeX Live 2013, the arcs package behaves strangely. It
 draws the desired arc under the respective characters, but a string like
 5.0pt will always precede the characters with the arc. Has anyone else
 noticed this problem?
 
 I have tried this MWE with two different fonts on two computers running
 Debian unstable and a vanilla TL 2013:
 
 \documentclass[a4paper,12pt]{article}
 
 \usepackage{xltxtra}
 
 \setromanfont[Mapping=tex-text]{Junicode}
 \usepackage{arcs}
 
 \begin{document}
 An underarc: \underarc{ab}. And now an overarc: \overarc{ab}.
 
 \end{document}
 
 Best wishes,
 Arash

Nothing to do with XeTeX: it's due to relsize and shows up with a demo
for pdfTeX. Inside arcs.sty you find

\let \rs@size@warning = \@gobbletwo
\relsize{-10}%

but in the latest relsize (dated 2013-03-29) \rs@size@warning takes
three arguments. Thus arcs needs adjusting (probaly should just leave
the relsize code alone): I've CC'd the arcs author.
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] The arcs package

2013-08-25 Thread Joseph Wright
On 25/08/2013 16:00, Joseph Wright wrote:
 Nothing to do with XeTeX: it's due to relsize and shows up with a demo
 for pdfTeX. Inside arcs.sty you find
 
 \let \rs@size@warning = \@gobbletwo
 \relsize{-10}%
 
 but in the latest relsize (dated 2013-03-29) \rs@size@warning takes
 three arguments. Thus arcs needs adjusting (probaly should just leave
 the relsize code alone): I've CC'd the arcs author.

Message to the arcs author bounced. I'll raise this on c.t.t.: the
package is LPPL so a change can be sorted out if he can't be found.
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Error of xeCJK and fonspec (Bokov Gleb - First Message)

2013-05-31 Thread Joseph Wright
On 31/05/2013 13:52, Peter Dyballa wrote:
 
 Am 31.05.2013 um 14:10 schrieb Bruno Le Floch:
 
 This is caused by fontspec, I'd say, due to somewhat recent changes in
 the expl3 supporting package: \c_keys_code_root_tl was renamed
 \c__keys_code_root_tl at some point, to reflect its internal nature,
 and fontspec should not be using it.
 
 Fontspec [2013/03/16 v2.3a Font selection for XeLaTeX and LuaLaTeX] requires 
 {expl3}[2011/09/05] and seems to work alright for me with expl3.sty
 2013/03/14 v4469. These files don't make use of \c__keys_code_root_tl or 
 \c_keys_code_root_tl, on my system it's only l3keys.sty2013/02/24 v4461, 
 that has \c__keys_code_root_tl. A final update might be useful… (since TeX 
 Live has become stable a month ago)

All correct: there was a change, it was a while ago and provided you
have up-to-date fontspec and expl3 packages everything should work.
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] First message: Xelatex, pstricks and Mountain Lion

2013-01-14 Thread Joseph Wright
On 14/01/2013 15:10, François Boone wrote:
 Hi,
 
 I am on Macbook pro, 2012, with Mountain Lion, 10.8.2.
 My texlive 2012 is up to date : i update it with Tex Live Utility.
 
 I have a problem:
 This is my document:
 
 \listfiles
 \documentclass{minimal}
 \usepackage{pstricks}
 \begin{document}
 \begin{pspicture}(0,0)(10cm,2cm)
 \psline[linewidth=2pt,linecolor=red](0,0)(10,2)
 \end{pspicture}
 \end{document}
 
 When I xelatex this simple example, I obtain a one blank page pdf file.
 
 In console, I have this message:
 (./ecm.aux) [1] (./ecm.aux)gs requires X11.  Please visit 
 http://support.apple.com/kb/HT5293 for more information.
 
 ** WARNING ** Filtering file via command --rungs -q -dNOPAUSE -dBATCH 
 -dNOSAFER -sPAPERSIZE=a0 -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 
 -dAutoFilterGrayImages=false -dGrayImageFilter=/FlateEncode 
 -dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode 
 -sOutputFile='/var/folders/tx/jz3ygk150jldyb_xjg26tbr4gn/T//dvipdfmx.yQDiuc2o'
  '/var/folders/tx/jz3ygk150jldyb_xjg26tbr4gn/T//dvipdfmx.Y8pRsQwx' -c 
 quit-- failed.
 ** WARNING ** Image format conversion for PSTricks failed.
 ** WARNING ** Interpreting special command pst: (ps:) failed.
 ** WARNING **  at page=1 position=(91.9253, 663.307) (in PDF)
 ** WARNING **  xxx pst:  tx@Dict begin STP newpath /ArrowA { moveto } def 
 /ArrowB 
 ** WARNING ** 5 memory objects still allocated
 You may want to report this to te...@tug.org
 
 I had a long talk with Herbert Voss this last WE and we don't find a 
 solution. Herbert works on PC, so it is difficult to find the problem on Mac.
 I someone can help me...
 
 Thank-you
 François

Have you tried installing XQuartz? (Apple no longer include X11 with ML,
so you have to get your own from the community spin-out.)
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] bidi.sty for plain XeTeX

2012-12-22 Thread Joseph Wright
On 22/12/2012 08:48, Philip TAYLOR wrote:
 
 
 heer wrote:
 Joseph,

 I have here Vafa Khalighi's documentation for his bidi package. It has a 
 section on using his bidi package with plain TeX.  He says the bidi package 
 is loaded with the command \input bidi, but that command doesn't work (at 
 least with my 2009 version of TeX Live) because xetex can't find the bidi 
 file.  If I use the command \input bidi.sty xetex finds the bidi.sty file 
 for xelatex not xetex. That's why I was wondering whether there wasn't 
 another bidi.sty file available for plain xetex.
 Perhaps there are other bidi packages available that I do not know of.
 
 Probably this one, Nicholas :
 
 d:/TeX/Live/2011/texmf-dist/tex/latex/bidi/bidi.tex
 
 Philip Taylor

Indeed,

  \input bidi
  Some text
  \bye

works on my system using XeTeX.
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] bidi.sty for plain XeTeX

2012-12-22 Thread Joseph Wright
On 21/12/2012 22:24, John Was wrote:
 I'm not an Arabist but have occasionally had to typeset articles in
 plain XeTex using Arabic, and all I have in my file header is:
 
 \TeXXeTstate=1 % this turns e-TeX's bidi functionality on
 \def\intextarab#1{{\arabic {\beginR #1\endR}}}
 
 I define \arabic as a call to my Arabic font (the definition of \arabic
 changes according to whether  I'm in main text, footnote text, or
 extract text).  To achieve Arabic I just give \intextarab{ARABIC TEXT
 HERE}. That works fine for bits of Arabic embedded in English (or other
 left-to-right) text in the same paragraph.  For separate Arabic
 paragraphs you really just need
 \beginR
 
 and at the end
 
 \endR
 
 
 There are no doubt slicker ways of doing things, but that gave me good
 output first time round so I stuck with it!
 
 
 John

For an entire document, you need to worry about \everypar and
\parindent, for example

  \TeXXeTstate = 1 %
  \newbox\indentbox
  \everypar{%
\setbox\indentbox=\lastbox
\beginR
  \box\indentbox
  }

as \beginR is an hmode command.
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] bidi.sty for plain XeTeX

2012-12-21 Thread Joseph Wright
On 21/12/2012 21:52, heer wrote:
 
 Is there a bidi.sty file for plain XeTeX or only for XeLateX? I'd
 like to be able to use Arabic script in plain XeTeX.
 
 Nicholas

The bidi docs include instructions for using it with plain XeTeX
(although I'd imagine plain users would tend to 'roll their own').
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [luatex] Info on direction primitives/implementation

2012-12-05 Thread Joseph Wright
On 05/12/2012 08:54, Vafa Khalighi wrote:
 

 At the moment, I'm looking specifically at what we need to worry about
 at a low level. For example, the current expl3 code does not take any
 notice of direction, which is probably right for something like \hbox:n
 (follow whatever is going on around it), but should be documented and
 deliberate, not just something we've ignored. So what's important at
 this stage is much more the concepts than trying to write any code,
 although any thoughts on what is required for RTL support at the 'base
 level' are of course welcome.
 
 
 For the boxes in luatex you can change directions: \hbox dir TRT{...}

I was thinking more at the level of something like \hboxR/\hboxL as
defined by bidi, plus perhaps some form of test similar to \ifmmode.
Then again, I have no idea what is needed beyond certain small contexts
(for example ensuring LTR for units).

 For pdfTeX that's not an issue: I doubt very many people use pdfTeX for
 RTL. 
 
 Well, there are two groups of people. The first group use ArabTeX which does 
 not 
 make any use of TeX--XeT and it works with Knuth TeX too. The second group 
 also 
 are Hebrew and Arab users; some of them still use babel.
 
 XeTeX is a bit more 'interesting': I guess the existence of bidi
 means that people are using XeTeX for 'real life' RTL work, despite
 limitations.
 
 Considering bidi has improved the situations and made things cleaner and 
 simpler, yes.

Right, so some more thinking required here. The question is what is
sensible for new content: from what you say about TeX--XeT and bidi,
using pdfTeX/XeTeX is not currently to be recommended for RTL work
(although I guess this would change if XeTeX switches to the Omega
approach).
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [luatex] Info on direction primitives/implementation

2012-12-05 Thread Joseph Wright
On 05/12/2012 12:14, Jonathan Kew wrote:
 On 5/12/12 11:56, Joseph Wright wrote:
 
 Right, so some more thinking required here. The question is what is
 sensible for new content: from what you say about TeX--XeT and bidi,
 using pdfTeX/XeTeX is not currently to be recommended for RTL work
 
 IMO, this is an overly general and somewhat misleading blanket
 statement. The TeX--XeT model has its limitations, certainly, but there
 are plenty of RTL documents for which it is perfectly adequate. Not
 every document requires mixed-direction math, or \special-based colour
 and hyperlinks wrapping across lines.
 
 Back when I was doing typesetting on behalf of authors and publishers in
 Pakistan and elsewhere, we used (TeX--XeT-based) XeTeX with considerable
 success and without feeling at all constrained by its bidi shortcomings.
 But then, we were producing traditional printed books full of text,
 not colourful, hyperlinked PDFs full of math.
 
 JK
Hello Jonathan,

It was not my intention to criticise XeTeX: I'm trying to get a feel for
the issues in RTL work and to understand what people who use the TeX
tools in this area find works (or does not). I'm also trying to work out
how the different engines work when you abstract to the package layer,
i.e. to what extend you can create macros which hide the complexities or
which do not enforce particular engine requirements. All of the input is
very useful.
--
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [luatex] Info on direction primitives/implementation

2012-12-05 Thread Joseph Wright
On 05/12/2012 12:35, Zdenek Wagner wrote:
 The book is written in Czech but contains words and sometimes even
 sentences in Hindi and Urdu. The lines are often broken within the
 Urdu text.

So mainly constructs of the form

  blah blah blah \beginR halb halb halb\endR blah blah blah

or similar?

 Since the book contains more than 300 pictures, almost all
 paragraphs require \parshape. 

Paragraph shape is a someone separate issue :-) (In the sense that I
have other worries that come up there without bringing in RTL!)
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [luatex] Info on direction primitives/implementation

2012-12-04 Thread Joseph Wright
[Moved from the LuaTeX list as this moves me to XeTeX!]

On 04/12/2012 14:33, Khaled Hosny wrote:
 On Tue, Dec 04, 2012 at 01:52:19PM +, Joseph Wright wrote:
 Very useful, thanks. Point about detail understood: at the moment I'm
 trying to get my head around the entire area, and to see how the LuaTeX
 version contrasts with the pdfTeX/XeTeX approach.
 
 In the last few days I have been fancying the idea of scrapping TeX--XeT
 from XeTeX and replacing it with code from Aleph (with LuaTeX
 modifications backported to it), but no work has been done so far (and
 probably never will, I always underestimate how hard things are).
 
 Regards,
 Khaled

Hello Khaled,

That suggests that TeX--XeT is not providing the tools required to do a
decent job in XeTeX: is that a fair reading? (I'd guess that there were
reasons for the Omega/Aleph/LuaTeX move from TeX--XeT to other
approaches, but as a non-expert was not sure how to read it.)

A slightly wider question which this leads me to: do I take it that
getting some (minor) additions to XeTeX might be possible? There was
some discussion last week about a few pdfTeX primitives that might be
useful.
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [luatex] Info on direction primitives/implementation

2012-12-04 Thread Joseph Wright
On 04/12/2012 15:55, Khaled Hosny wrote:
 IMO TeXXeT and all its incarnation has always been a hack to get RTL
 with the least modification as possible. Two main limitations I'm
 concerned about are the broken handling of specials (you can not get
 coloring or hyperlinks in RTL text without macro hacks with limited
 functionality) and lack of RTL math. On the other hand Omega's
 directionality code is more sophisticated and requires less adaptations
 at macro side (check the size of bidi package on how things can go wild
 with TeX--XeT), and it should allow for proper vertical typesetting in
 XeTeX as bonus.

I'd picked up the math mode point (although I'm not 100% sure on how
this works out with numerals, which don't always seem to be reversed
compared to the latin script order). The point about specials is one I
guess I'll look at by reading the bidi code and doing some tests.

As you might guess, my interest here steams from some LaTeX3
discussions, and one issue I'm trying to understand is whether the
TeX--XeT approach is really one that is sensible to try to support,
given the fact that the Omega approach exists. Certainly if you move
XeTeX from TeX--XeT to the Omega approach (with the LuaTeX fixes, I
guess), then this will be an easier thing to think about!

 A slightly wider question which this leads me to: do I take it that
 getting some (minor) additions to XeTeX might be possible? There was
 some discussion last week about a few pdfTeX primitives that might be
 useful.
 
 Patches are always welcomed of course :) Right now work is mainly on
 layout side (replacing the abandoned ICU LayoutEngine with HarfBuzz,
 there is even a chance of replacing ATS/ATSU with Core Text, and old
 SilGraphite with Graphite2 engine) and further polishing of OpenType
 math.

Jonathan K. will tell you that my ability to write XeTeX patches is very
limited :-) I was thinking much less complicated than layout engines
(which all sounds very good!). I'll see what makes sense and perhaps be
in contact again.
--
Joseph Wright



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [luatex] Info on direction primitives/implementation

2012-12-04 Thread Joseph Wright
On 05/12/2012 00:45, Vafa Khalighi wrote:
 Yes, the limitations of TeX--XeT are:
 
   * Only four primitives \beginR \endR, \beginL \endL are provided which makes
 typesetting RTL documents very hard and complicated.
   * The primitives above only work in horizontal mode.
   * No way to typeset RTL tabular, the only approach is to put tabular inside 
 an
 RTL box which itself introduces lots of problems.
   * \special do not work properly in RTL mode.
   * There is no way to change the direction of boxes and even if you do by
 trick, the order of TOC or anything that has to do with \write at shipout
 time gets wrong.
   * left/right skips do not get reversed in RTL, so you have to replace them
 with each other and this is not always the case, e.g. \vbox inside \hbox
   * \parshape is not reversed in RTL mode so you have to do some macro
 programming and this is not always the case, e.g, \vbox inside \hbox
   * No tool for controlling equation number; it only provides
 \predisplaydirection which is buggy in RTL.
   * 

Very useful list :-) (I knew some of these, but it's nice to have them
collected up.)
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] [luatex] Info on direction primitives/implementation

2012-12-04 Thread Joseph Wright
On 05/12/2012 00:53, Vafa Khalighi wrote:
 The point about specials is one I
 guess I'll look at by reading the bidi code and doing some tests.
 
 bidi package does not patch \special. It only make changes in packages like 
 color, xcolor and hyperrref, etc. And this makes them work in a very limited 
 case. For example, if you use color package in RTL with bidi package, color 
 works correctly only if your colored text stays on one line.

OK, does sound a bit limited.

 As you might guess, my interest here steams from some LaTeX3
 discussions, and one issue I'm trying to understand is whether the
 TeX--XeT approach is really one that is sensible to try to support,
 given the fact that the Omega approach exists. 
 
 
 My advice: do not waste time on TeX--XeT; it's useless. I have spent four 
 years 
 developing bidi package using TeX--XeT and I can tell you that it has many 
 bugs/limitations.

At the moment, I'm looking specifically at what we need to worry about
at a low level. For example, the current expl3 code does not take any
notice of direction, which is probably right for something like \hbox:n
(follow whatever is going on around it), but should be documented and
deliberate, not just something we've ignored. So what's important at
this stage is much more the concepts than trying to write any code,
although any thoughts on what is required for RTL support at the 'base
level' are of course welcome.

What you say fits in with what I'd already suspected: for RTL work,
we've be better only supporting one set of primitives, the Omega ones.
For pdfTeX that's not an issue: I doubt very many people use pdfTeX for
RTL. XeTeX is a bit more 'interesting': I guess the existence of bidi
means that people are using XeTeX for 'real life' RTL work, despite
limitations.
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] troubles with \setcounter in Tex 2012

2012-12-03 Thread Joseph Wright
On 02/12/2012 18:31, Andrey Klebanov wrote:
 Dear all, 
 
 since my very recent update to the new tex distribution (tex 2012) I'm facing 
 a problem with setting values of counter using mathematic formulas as 2-1.
 a minimal example looks like this:
 
 \documentclass{memoir}
 \begin{document}
 \newcounter{b}
 \setcounter{b}{2-1}
 \theb
 \end{document}
 
 which in my case sets the \value{b} to 2 and prints -1 (s. the attached 
 file), instead of setting the \value{b} to 1. (I need it in order to set a 
 connection between 2 counters like \setcounter{b}{\thea-2} etc.)
 I would be extremely thankful for any support, since this issue completely 
 destroys my workflow.
 
 thanks in advance 
 Andrey

You need to load the calc package: perhaps memoir was in earlier versions?
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] xelatex,polyglossia,biblatex conflict

2012-08-24 Thread Joseph Wright
On 24/08/2012 12:37, Haines Brown wrote:
 I'm migrating from LaTeX to XeLaTeX with TL 2011 on linux system.
 
   \documentclass[12pt]{article}
   \usepackage{xltxtra}
   \usepackage[backend=biber,style=authoryear,sorting=nyt]{biblatex} 
   \usepackage[style=authoryear,sorting=nyt]{biblatex} 
   \usepackage{xunicode}
   \usepackage{fontspec}
   \usepackage{csquotes}
   %  \usepackage{polyglossia}
   \addbibresource{bib}
   ...
 
 I would like to use the polyglossia package, but if I add it, I get 
 an Undefined control sequence error.
 
 Haines Brown

Without a bit more detail it's hard to say, but I would strongly suggest
using TL2012. Development of both biblatex and polyglossia is active,
and so issues can and do get fixed over time.
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeLaTeX and SIunitx

2012-06-12 Thread Joseph Wright
On 12/06/2012 15:10, Philip TAYLOR wrote:
 
 Tobias Schoel wrote:
 What does normalise mean with angstrom and ohm?
 
 Perhaps as per
 http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization
 Philip Taylor

Indeed: normalization is a way of dealing with differences in logical
meaning where the symbols used are identical. For siunitx, I have to
balance meaning with the likelihood of the symbol appearing in the
output at all. Using the normalisation characters means that you have
the best chance of getting the visually correct output, while still
being able to search using the UTF-8 characters correctly.
--
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeLaTeX and SIunitx

2012-06-11 Thread Joseph Wright
Hello all,

Taking a look back over the code, I already have some auto-detection in
for picking up UTF-8 symbols when the correct engine is in use.

I've revised this a bit for the next release (v2.5d, on CTAN tomorrow),
so that all of the 'problematic' symbols are covered in what seems to be
the best way possible. Nothing happens unless appropriate support
(fontspec/unicode-math) is loaded. If it is, then you get the following
symbols:

 - Ångström   u+00c5 (u+212b normalises here)
 - Degree Celsius u+00b0 + C (u+2103 is a compatibility character)
 - Micro  u+00b5 (u+03bc is wrong)
 - Ohmu+03a9 (u+2126 normalises here)

 - Degree u+00b0
 - Arc minute u+2032 (requires unicode-math)
 - Arc second u+2033 (requires unicode-math)
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Minimalist TeX?

2012-05-16 Thread Joseph Wright
On 16/05/2012 05:38, C Y wrote:
 I have compiled xetex from the latest Git sources on sourceforge, and the 
 build appears to have been successful.
 
 Does the sourceforge Git repo of xetex produce a working (albeit minimal) TeX 
 once compilation is complete?  (It didn't seem to in my quick test, but it's 
 quite possible I didn't do something right environment wise...)  If not, is 
 there documentation anywhere of what constitutes the minimal set of files 
 that will allow an average LaTeX document to be typeset?
 
 My interest is in building a Minimalist subset of TeX in situations where a 
 system installation isn't present, but I've not had much luck locating 
 documentation describing what constitutes a minimal-yet-functional subset of 
 the TeX Live distribution.  Has anybody documented such a subset?
 
 Thanks,
 CY

TeX is more than one program, so it's not as simple as grabbing 'source
for the binary nameTeX' and compiling it.

For a minimal compilable set up, maybe take a look at KerTeX:
http://www.kergis.com/en/kertex.html
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Minimalist TeX?

2012-05-16 Thread Joseph Wright
On 16/05/2012 17:18, Khaled Hosny wrote:
 TeX is more than one program, so it's not as simple as grabbing 'source
 for the binary nameTeX' and compiling it.

 For a minimal compilable set up, maybe take a look at KerTeX:
 http://www.kergis.com/en/kertex.html
 
 Which does not include XeTeX :)

Oh yes, license and library issues: I forgot :-)
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeLaTeX and SIunitx

2012-05-15 Thread Joseph Wright
On 15/05/2012 17:21, Tobias Schoel wrote:
 But what you said seems to indicate to me, that it would be more
 sensible to create my own package xesiunitx, which solves the problem
 for my situation. As I only use open fonts, there aren't so many
 possibilities, and even for arbitrary fonts, one might only check for
 the best solutions and else uses siunitx' fallbacks.

I'm keen to avoid package proliferation where possible, escpecially
where we are looking at essentially at  settings for another package.

I've created a new issue for siunitx:
https://bitbucket.org/josephwright/siunitx/issue/199/improve-default-symbols-when-using-utf-8#comment-1423028.
I will take a look at this over the next few days: there will need to be
some non-trivial testing. As it's not tied to XeTeX (the same applies to
LuaTeX) I'd suggest anyone wanting to discuss this particular case does
so via the comments on the BitBucket site :-)
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] XeLaTeX and SIunitx

2012-05-14 Thread Joseph Wright
On 14/05/2012 20:38, Tobias Schoel wrote:
 If I understand you correctly, there are two ways, in which this
 could/should be solved on package level:
 
 1. siunitx gets an option / command whatever, which does approximately:
 \ifxetex\input{other file which can include suitable unicode symbols}\fi
 
 2. a new package xesiunitx is created, which does approximately:
 \usepackage{siunitx}
 \sisetup{definitions using suitable unicode symbols, depending on
 package option}
 
 or
 
 \usepackage{siunitx}
 \testiffonthassymbols
 \sisetup{definitions using suitable unicode symbols}
 \else
 \sisetup{some other helpful definitions}
 \fi

As I've tried to explain, there are simply too many possible
combinations to cover things for XeTeX and LuaTeX users without them
actually checking the settings they use. The best that I can do even
with pdfTeX is provide some sensible defaults, and even there there are
failure cases (for a start, any 'non-standard' font packages may well
fail to give good output).

My current approach is to be honest with XeTeX/LuaTeX users and say
'look, you are going to have to check that the font you've chosen to use
has the correct symbols available'. I am happy to consider changes, but
what I don't want to do is give the impression that it's possible to do
all of this automatically: that is not what I've found.
--
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Babel

2012-05-04 Thread Joseph Wright
On 04/05/2012 17:48, Javier Bezos wrote:
 As to the mailing list, I'm not sure. There is the latex-l
 list, but it's intended mainly for LaTeX3, and babel is a
 LaTeX2e (and Plain) thing, but after cleaning up babel there
 will be very likely further work on a new multilingual core
 for LaTeX3, and I presume discussing babel will be ok.

My understanding is that LaTeX-L is for 'LaTeX core' discussion, which
covers LaTeX2e, LaTeX3, 'required', 'tools', etc. The fact that LaTeX2e
is not changing means that there not much to say, but there is
occasionally something. As you say, babel material would be useful for
thinking about the issues for LaTeX3 too.
--
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Detect, whether a font contains a certain character

2011-11-28 Thread Joseph Wright
On 28/11/2011 12:34, Tobias Schoel wrote:
 So this seems to work:
 
 \documentclass{minimal}
 \usepackage{fontspec}
 \usepackage{unicode-math}
 \setmathfont{XITS Math}
 
 \usepackage{siunitx}
 \sisetup{%
 detect-all,
 }
 
 \newcounter{works}
 \setcounter{works}{0}
 
 \usepackage{pgffor}
 
 \begin{document}
 
 \newcommand{\ifavailablethenelse}[4]{%#1=font,#2=charcode,#3=then-clause,#4=else-clause
 
 \setcounter{works}{0}
 \bgroup
 \font\test=#1 \test
 \ifnum\XeTeXfonttype\font0
  \ifnum\XeTeXcharglyph#20
   \setcounter{works}{1}
  \fi
 \fi
 \egroup
 \ifnum\theworks=1
  #3
 \else
  #4
 \fi
 }
 
 \sisetup{math-celsius=foo}
 
 \foreach \phont in {Asana Math,XITS Math, STIXGeneral, Neo Euler}
 {
 \ifavailablethenelse{\phont}{2103}{\sisetup{math-celsius=℃}}{%
  \ifavailablethenelse{\phont}{00B0}{\sisetup{math-celsius=°C}}{
   \sisetup{math-celsius=nix}
  }
 }
 \setmathfont{\phont}
 \(\SI{123}{\celsius}\)
 }
 
 
 \sisetup{text-celsius=bar}
 
 \foreach \phont in {DejaVu Serif, Linux Libertine O, TeX Gyre Pagella,
 Arial}
 {
 \ifavailablethenelse{\phont}{2103}{\sisetup{text-celsius=℃}}{%
  \ifavailablethenelse{\phont}{00B0}{\sisetup{text-celsius=°C}}{
   \sisetup{text-celsius=niente}
  }
 }
 \setmainfont{\phont}
 \SI{123}{\celsius}
 }
 \end{document}
 
 Albeit, it seems to be slow.

I hope you can see why siunitx does not attempt to auto-detect every
available case! (I may slightly alter the XeTeX/LuaTeX defaults, but am
currently testing that this is indeed okay.)
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Scaling fonts in fontspec changes the relative gap in 100 °C when using siunitx

2011-11-24 Thread Joseph Wright
On 24/11/2011 13:56, Tobias Schoel wrote:
 Hi,
 
 consider this minimal example:
 
 \documentclass{minimal}
 
 \newcommand{\phont}{Asana Math}
 \newcommand{\Phont}{TeX Gyre Pagella}
 
 \usepackage{fontspec}
 \setmainfont{\Phont}
 \usepackage{unicode-math}
 \setmathfont{\phont}
 
 \usepackage{siunitx}
 \sisetup{%
 math-celsius=℃,
 text-celsius=℃,
 detect-all
 }
 
 \begin{document}
 \setmainfont[Scale=1]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=2]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=3]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=4]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=5]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=6]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=7]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=8]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=9]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=10]{\Phont}
 \SI{100}{\celsius}
 
 \clearpage
 \renewcommand{\Phont}{TeX Gyre Pagella Bold}
 
 \setmainfont[Scale=1]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=2]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=3]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=4]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=5]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=6]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=7]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=8]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=9]{\Phont}
 \SI{100}{\celsius}
 
 \setmainfont[Scale=10]{\Phont}
 \SI{100}{\celsius}
 
 \clearpage
 
 \setmathfont[Scale=1]{\phont}
 \(\SI{100}{\celsius}\)
 
 \setmathfont[Scale=2]{\phont}
 \(\SI{100}{\celsius}\)
 
 \setmathfont[Scale=3]{\phont}
 \(\SI{100}{\celsius}\)
 
 \setmathfont[Scale=4]{\phont}
 \(\SI{100}{\celsius}\)
 
 \setmathfont[Scale=5]{\phont}
 \(\SI{100}{\celsius}\)
 
 \setmathfont[Scale=6]{\phont}
 \(\SI{100}{\celsius}\)
 
 \setmathfont[Scale=7]{\phont}
 \(\SI{100}{\celsius}\)
 
 \setmathfont[Scale=8]{\phont}
 \(\SI{100}{\celsius}\)
 
 \setmathfont[Scale=9]{\phont}
 \(\SI{100}{\celsius}\)
 
 \setmathfont[Scale=10]{\phont}
 \(\SI{100}{\celsius}\)
 \end{document}
 
 When scaling the fonts, it seems to me, that the gap between 100 and °C
 (actually its ℃=u2103), which some fonts don't have but TeX Gyre Pagella
 and Asana Math do, gets smaller in relation to the overall size.
 
 What is the reason for this?
 
 bye
 
 Toscho
 

The 'gap' here is the product marker for multiplication of the value by
the unit. This is by default a thin space, and is always set in math
mode using the current siunitx approach. You can force the use of the
text mode font with

  \sisetup{number-unit-product = \text{\,}}

Now, the reason that this is in math mode is because I was aiming at the
case where products are actually shown as such, using \cdot or \times.
It seemed (when I initially wrote siunitx) that forcing math mode here
was the most sensible approach.

In siunitx v2.4 (current release), I dropped several of these 'always
math' ideas, and am currently seeing what feedback I get on this. *If*
this seems acceptable to users, I may alter the behaviour of the
'product-like' options to also require \ensuremath to guarantee math
mode for symbols. This would be a breaking change, and so I feel it is
best to first see how the changes in v2.4 work in practice.
-- 
Joseph Wright


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


  1   2   >