Re: [XeTeX] in XeTeX
Am 13.11.2011 um 23:14 schrieb Ross Moore: > Is there a EUR 0,01 coin? :-) Yes, 1 ¢ and 2 ¢ coins exist. -- Mit friedvollen Grüßen Pete When Richard Stallman goes to the loo, he core dumps. -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
Hi all, On 14/11/2011, at 7:55 AM, Zdenek Wagner wrote: > Before typing a document one should think what will be the purpose of > it. If the only purpose is to have it typeset by (La)TeX, I would just > use well known macros and control symbols (~, $, &, %, ^, _). If the > text should be stored in a generic database, I cannot use ~ because I > do not know whether it will be processed by TeX. I cannot use > because I do not know whether it will be processed by HTML aware > tools. I cannot even use because the tool used for processing > the exported data may not understand entities at all. In such a case I > must use U+00a0 and make sure that the tool used for processing the > data knows how to handle it, or I should plug in a preprocessor. This is exactly correct. Text will be entered into whatever tools, for storing data. Such text may well contain characters (rightly or wrongly) that have not traditionally been used in (La)TeX typesetting. Thus the problem is: "what should be the default (Xe)TeX behaviour when encountering such characters in the input stream?" Currently there is no part of building the XeTeX.def format that handles these, apart from" "00A0 (= u00a0) being set to have \catcode 12. see the coding of xetex.ini Nothing sets any properties of characters in the range: U+02000 --> U+0200F , U+02028 --> U+0202F apart from perhaps in bidi.sty which needs the RTL and LTR marks, ZWNJ and maybe some others. But bidi.sty is optionally loaded by the user, so does not count here as the *default* behaviour for XeTeX-based formats. The result is that these characters just pass through to the output, as part of a character string within the PDF, *provided* the font supports them. However, the tradition .tfm-based TeX fonts just treat these as missing characters, contributing zero to the metric width. There'll be a message in the .log file: >>> Missing character: There is no in font cmr10! >>> Missing character: There is no in font cmr10! >>> Missing character: There is no in font cmr10! >>> Missing character: There is no in font cmr10! This seems like a reasonable default behaviour, especially in light of the lack of consensus to do anything else. One slight problem is that those "Missing character" messages do not go to the "console" window, but only to the .log file. Many users will not notice this. Although this is just following TeX's design, and was quite sensible when TeX was just using its own CMR fonts, I think that XeTeX should have directed such warning messages also to the Console. XeTeX has stepped out of the tightly controlled environment of traditional TeX jobs, so should also have re-thought about what are "errors" and "warnings" and extra technical information, and how relevant these would be to users/authors. The point here is that users might simply not notice that some of the characters in their inputs may not have not been processed in the best possible way. This would be particularly the case for characters that have no visible rendering, but just insert extra space, as are being discussed in this thread. > >>> Where would such a default take place: >>> - XeTeX engine >>> - XeLaTeX format >>> - some package (xunicode, fontspec, some new package) xunicode doesn't handle the meaning of non-ascii input. It is designed primarily for mapping legacy ascii-style input (via macro-names) to the best-possible Unicode code-point(s). fontspec isn't right either, as we are talking about spacing, not actual printed characters from a font. >>> - my own package/preamble template >> >> None of these ? In a stand-alone file that can be \input >> by Plain XeTeX users, by XeLaTeX users, and by XeLaTeX >> package authors. I think that this counts as a "package", just using a .tex (or other) suffix, rather than necessarily .sty . A TEC-kit mapping file is another place where these characters can be processed; e.g. removed, if there is no need for them to be part of the final PDF output. However this inhibits the possibility of earlier logic being easily applied, to test the context of the role of these special characters and act accordingly. >> >> In a future XeTeX variant (if such a thing comes to exist), >> the functionality could be built into the engine. Certainly some default behaviour could be included. But what is best? Assigning a \catcode of 10 would be appropriate in some situations, for some characters. Making some characters active, then giving an expansion, would be appropriate in other situations. Packages could be written for these situations. But then, as always, it is up to the users to recognise the issues, for their own particular data and their own output requirements, then choose packages accordingly. >> >> My EUR 0,02 (while we still have one). >> ** Phil. Is there a EUR 0,01 coin? :-) We lost our AUD 0.01 and 0.02 coins long ago. There is even talk now of dropping the 0.05 one. > > > --
Re: [XeTeX] in XeTeX
2011/11/13 Philip TAYLOR : > > > Tobias Schoel wrote: >> >> Now, that the practicability is cleared, let's come back to the >> philosophical part: > > Actually, I think this is the practical/pragmatic part, > but let's carry on none the less ... >> >> Should =u00a0 be active and treated as ~ by default? Just like >> u202f and u2009 should be active and treated as \, and \,\hspace{0pt}? > > Well : a macro-based solution is certainly the best place > to start (and to experiment) but the particular expansions > that you have chosen are not entirely generic : \hspace, > for example, is unknown in Plain TeX, and is therefore > better replaced with \hskip. Whether \hskip would then > work happily with LaTeX, I have no idea, but it is by > no means unreasonable to think that there might be format- > specific definitions for each of these characters. >> In LaTeX \hskip does exactly the same as in plain but the question is when this replacement should occur. It may seem that a TECkit map can be used but this is applied after all macros have been expanded and the horizontal list is beaing created. If you replace U+00a0 with \hskip at that time, \hskip will be printed in the current font. In order to insert \hskip as a token the replacement has to occur in TeX mouth. The size of the skip, its stretchability and shrinkability is taken from the fondimen registers of the curent font but the TeX mouth does not know what font will be current when the replaced U+00a0 will be processed by the TeX stomach. The mouth cannot simply replace it with ~ becose it does not know what will be its meaning when it is processed in the stomach. Before typing a document one should think what will be the purpose of it. If the only purpose is to have it typeset by (La)TeX, I would just use well known macros and control symbols (~, $, &, %, ^, _). If the text should be stored in a generic database, I cannot use ~ because I do not know whether it will be processed by TeX. I cannot use because I do not know whether it will be processed by HTML aware tools. I cannot even use because the tool used for processing the exported data may not understand entities at all. In such a case I must use U+00a0 and make sure that the tool used for processing the data knows how to handle it, or I should plug in a preprocessor. And I must prepare a suitable input method how the users will enter U+00a0. I have it on my keyboard but I am not sure whether such a key is a common feature. If a user has to enter it using a weird combination, he or she will not do it. Remember that a user may work remotely via ssh or telnet with no graphics. (Even then my keyboard contains U+00a0.) >> Where would such a default take place: >> - XeTeX engine >> - XeLaTeX format >> - some package (xunicode, fontspec, some new package) >> - my own package/preamble template > > None of these ? In a stand-alone file that can be \input > by Plain XeTeX users, by XeLaTeX users, and by XeLaTeX > package authors. > > In a future XeTeX variant (if such a thing comes to exist), > the functionality could be built into the engine. > > My EUR 0,02 (while we still have one). > ** Phil. > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
Tobias Schoel wrote: Now, that the practicability is cleared, let's come back to the philosophical part: Actually, I think this is the practical/pragmatic part, but let's carry on none the less ... Should =u00a0 be active and treated as ~ by default? Just like u202f and u2009 should be active and treated as \, and \,\hspace{0pt}? Well : a macro-based solution is certainly the best place to start (and to experiment) but the particular expansions that you have chosen are not entirely generic : \hspace, for example, is unknown in Plain TeX, and is therefore better replaced with \hskip. Whether \hskip would then work happily with LaTeX, I have no idea, but it is by no means unreasonable to think that there might be format- specific definitions for each of these characters. Where would such a default take place: - XeTeX engine - XeLaTeX format - some package (xunicode, fontspec, some new package) - my own package/preamble template None of these ? In a stand-alone file that can be \input by Plain XeTeX users, by XeLaTeX users, and by XeLaTeX package authors. In a future XeTeX variant (if such a thing comes to exist), the functionality could be built into the engine. My EUR 0,02 (while we still have one). ** Phil. -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
Now, that the practicability is cleared, let's come back to the philosophical part: Should =u00a0 be active and treated as ~ by default? Just like u202f and u2009 should be active and treated as \, and \,\hspace{0pt}? Where would such a default take place: - XeTeX engine - XeLaTeX format - some package (xunicode, fontspec, some new package) - my own package/preamble template As was discussed in the Thread "Space characters and whitespace", using these characters without any treatment contradicts TeX's spacing algorithms. So it seems, one should not use these characters and blame unicode OR treat these characters specially. bye Toscho Am 13.11.2011 21:36, schrieb Mike Maxwell: On 11/13/2011 11:09 AM, Tobias Schoel wrote: How much text flow control mechanism should be done by none-ASCII characters? Unicode has different codepoints for signs with the same meaning but different text flow control (space vs. non-break space). So text flow could be controled via Unicode codepoints. But should it? Or should text flow be controled via commands and active characters? One opinion says, that using (La)TeX is programming. Consequently, each character used should be visually well distinguishable. This is not the case with all the Unicode white space characters. One opinion says, that using (La)TeX is transforming plain text (like .txt) in well formatted text. Consequently, the plain text may contain as much (meta)-information as possible and these information should be used when transforming it to well formatted text. So Unicode white space characters are allowed and should be valued by their specific meaning. And on the third hand, XeTeX could allow both. > How would you visually differentiate between all > the white space characters (space vs. non-break space, thin space > (u2009) vs. narrow no-break space (u202f), … ) such that the text > remains readable? Of course, there's precedent for this kind of problem: tab characters. For that matter, many text editors display Unicode combining diacritics over or under the base character that they go with, which is already getting away from a straightforward display of the underlying characters. At any rate, there are lots of ways non-ASCII space characters could be distinguished; Philip Taylor mentions color coding, which is certainly possible. Another would be to display some kind of code for non-ASCII spaces. There's one font which displays all characters as nothing but their Unicode code points (in hex) inside some kind of box. A tex(t) editor could certainly be programmed to display control characters (which these space characters essentially are) differently from the "regular" characters (which would continue to be displayed with an ordinary font). The editor I use, jEdit, provides yet another option: a command (bindable to a keystroke) that tells me the Unicode code point of any character, on the editor's status line. -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
On 11/13/2011 11:09 AM, Tobias Schoel wrote: How much text flow control mechanism should be done by none-ASCII characters? Unicode has different codepoints for signs with the same meaning but different text flow control (space vs. non-break space). So text flow could be controled via Unicode codepoints. But should it? Or should text flow be controled via commands and active characters? One opinion says, that using (La)TeX is programming. Consequently, each character used should be visually well distinguishable. This is not the case with all the Unicode white space characters. One opinion says, that using (La)TeX is transforming plain text (like .txt) in well formatted text. Consequently, the plain text may contain as much (meta)-information as possible and these information should be used when transforming it to well formatted text. So Unicode white space characters are allowed and should be valued by their specific meaning. And on the third hand, XeTeX could allow both. > How would you visually differentiate between all > the white space characters (space vs. non-break space, thin space > (u2009) vs. narrow no-break space (u202f), … ) such that the text > remains readable? Of course, there's precedent for this kind of problem: tab characters. For that matter, many text editors display Unicode combining diacritics over or under the base character that they go with, which is already getting away from a straightforward display of the underlying characters. At any rate, there are lots of ways non-ASCII space characters could be distinguished; Philip Taylor mentions color coding, which is certainly possible. Another would be to display some kind of code for non-ASCII spaces. There's one font which displays all characters as nothing but their Unicode code points (in hex) inside some kind of box. A tex(t) editor could certainly be programmed to display control characters (which these space characters essentially are) differently from the "regular" characters (which would continue to be displayed with an ordinary font). The editor I use, jEdit, provides yet another option: a command (bindable to a keystroke) that tells me the Unicode code point of any character, on the editor's status line. -- Mike Maxwell maxw...@umiacs.umd.edu "My definition of an interesting universe is one that has the capacity to study itself." --Stephen Eastmond -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
Am 13.11.2011 20:25, schrieb Zdenek Wagner: 2011/11/13 Tobias Schoel: Am 13.11.2011 12:35, schrieb Zdenek Wagner: 2011/11/13: On Sun, 13 Nov 2011, Petr Tomasek wrote: make ~ not active when writing my own macros because it contradicts the Unicode standard...) Isn't it just as much a "contradiction" of the "standard" for \ to do what \ does? I don't think that is a good way to decide what TeX's input format should be. -- And how about math and tables in TeX? And I would like to know a good text editor that visually displays U+00a0 in such a way that I can easily distinguish it from U+0020. If I canot see the difference, I can never be sure. And I definitely do not want to use hexedit for my TeX files. That is a good question. It's close to a question I asked earlier on this list: How much text flow control mechanism should be done by none-ASCII characters? Unicode has different codepoints for signs with the same meaning but different text flow control (space vs. non-break space). So text flow could be controled via Unicode codepoints. But should it? Or should text flow be controled via commands and active characters? One opinion says, that using (La)TeX is programming. Consequently, each character used should be visually well distinguishable. This is not the case with all the Unicode white space characters. One opinion says, that using (La)TeX is transforming plain text (like .txt) in well formatted text. Consequently, the plain text may contain as much (meta)-information as possible and these information should be used when transforming it to well formatted text. So Unicode white space characters are allowed and should be valued by their specific meaning. (La)TeX source file is not a plain text. Every LaTeX document nowadays starts with \documentclass but such text is not present in the output. Of course, the preamble isn't plain text, but mostly macros. I thought of the body of the document. I think, it's common practice for larger documents to have a main latex file, which reads \documentclass … \begin{document}\input{first_chapter}\input{second_chapter}…\end{document} In these cases, the input documents are more or less plain text (depending on the subject). Even XML is not plain text, you can use entities as ,' and many more. Of course, if (La)TeX is used for automatic processing of data extracted from a database that can contain a wide variety of Unicode character, it is a valid question how to handle such input. Or if the content is copy-pasted, from let's say HTML. But who would do that … Matthew Skala msk...@ansuz.sooke.bc.ca People before principles. http://ansuz.sooke.bc.ca/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
2011/11/13 Tobias Schoel : > > > Am 13.11.2011 12:35, schrieb Zdenek Wagner: >> >> 2011/11/13: >>> >>> On Sun, 13 Nov 2011, Petr Tomasek wrote: make ~ not active when writing my own macros because it contradicts the Unicode standard...) >>> >>> Isn't it just as much a "contradiction" of the "standard" for \ to do >>> what \ does? I don't think that is a good way to decide what TeX's >>> input format should be. >>> -- >> >> And how about math and tables in TeX? And I would like to know a good >> text editor that visually displays U+00a0 in such a way that I can >> easily distinguish it from U+0020. If I canot see the difference, I >> can never be sure. And I definitely do not want to use hexedit for my >> TeX files. > > That is a good question. It's close to a question I asked earlier on this > list: > > How much text flow control mechanism should be done by none-ASCII > characters? Unicode has different codepoints for signs with the same meaning > but different text flow control (space vs. non-break space). So text flow > could be controled via Unicode codepoints. But should it? Or should text > flow be controled via commands and active characters? > > One opinion says, that using (La)TeX is programming. Consequently, each > character used should be visually well distinguishable. This is not the case > with all the Unicode white space characters. > > One opinion says, that using (La)TeX is transforming plain text (like .txt) > in well formatted text. Consequently, the plain text may contain as much > (meta)-information as possible and these information should be used when > transforming it to well formatted text. So Unicode white space characters > are allowed and should be valued by their specific meaning. > (La)TeX source file is not a plain text. Every LaTeX document nowadays starts with \documentclass but such text is not present in the output. Even XML is not plain text, you can use entities as , ' and many more. Of course, if (La)TeX is used for automatic processing of data extracted from a database that can contain a wide variety of Unicode character, it is a valid question how to handle such input. >> >>> Matthew Skala >>> msk...@ansuz.sooke.bc.ca People before principles. >>> http://ansuz.sooke.bc.ca/ >>> >>> >>> -- >>> Subscriptions, Archive, and List information, etc.: >>> http://tug.org/mailman/listinfo/xetex >>> >> >> >> > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
One option would be to colour-code them, but I was more interested in the philosophy than the implementation. ** Phil. Not in every case. How would you visually differentiate between all the white space characters (space vs. non-break space, thin space (u2009) vs. narrow no-break space (u202f), … ) such that the text remains readable? -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
Am 13.11.2011 18:16, schrieb Philip TAYLOR: Tobias Schoel wrote: One opinion says, that using (La)TeX is programming. Consequently, each character used should be visually well distinguishable. This is not the case with all the Unicode white space characters. Is that not a function of the editor used ? Is it not valid for an editor to display different Unicode spaces differently, such that the user can visually differentiate between them ? Philip Taylor Not in every case. How would you visually differentiate between all the white space characters (space vs. non-break space, thin space (u2009) vs. narrow no-break space (u202f), … ) such that the text remains readable? Toscho -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
Tobias Schoel wrote: One opinion says, that using (La)TeX is programming. Consequently, each character used should be visually well distinguishable. This is not the case with all the Unicode white space characters. Is that not a function of the editor used ? Is it not valid for an editor to display different Unicode spaces differently, such that the user can visually differentiate between them ? Philip Taylor -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
Am 13.11.2011 12:35, schrieb Zdenek Wagner: 2011/11/13: On Sun, 13 Nov 2011, Petr Tomasek wrote: make ~ not active when writing my own macros because it contradicts the Unicode standard...) Isn't it just as much a "contradiction" of the "standard" for \ to do what \ does? I don't think that is a good way to decide what TeX's input format should be. -- And how about math and tables in TeX? And I would like to know a good text editor that visually displays U+00a0 in such a way that I can easily distinguish it from U+0020. If I canot see the difference, I can never be sure. And I definitely do not want to use hexedit for my TeX files. That is a good question. It's close to a question I asked earlier on this list: How much text flow control mechanism should be done by none-ASCII characters? Unicode has different codepoints for signs with the same meaning but different text flow control (space vs. non-break space). So text flow could be controled via Unicode codepoints. But should it? Or should text flow be controled via commands and active characters? One opinion says, that using (La)TeX is programming. Consequently, each character used should be visually well distinguishable. This is not the case with all the Unicode white space characters. One opinion says, that using (La)TeX is transforming plain text (like .txt) in well formatted text. Consequently, the plain text may contain as much (meta)-information as possible and these information should be used when transforming it to well formatted text. So Unicode white space characters are allowed and should be valued by their specific meaning. Matthew Skala msk...@ansuz.sooke.bc.ca People before principles. http://ansuz.sooke.bc.ca/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
2011/11/13 : > On Sun, 13 Nov 2011, Petr Tomasek wrote: >> make ~ not active when writing my own macros because it contradicts >> the Unicode standard...) > > Isn't it just as much a "contradiction" of the "standard" for \ to do > what \ does? I don't think that is a good way to decide what TeX's > input format should be. > -- And how about math and tables in TeX? And I would like to know a good text editor that visually displays U+00a0 in such a way that I can easily distinguish it from U+0020. If I canot see the difference, I can never be sure. And I definitely do not want to use hexedit for my TeX files. > Matthew Skala > msk...@ansuz.sooke.bc.ca People before principles. > http://ansuz.sooke.bc.ca/ > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] in XeTeX
On Sun, 13 Nov 2011, Petr Tomasek wrote: > make ~ not active when writing my own macros because it contradicts > the Unicode standard...) Isn't it just as much a "contradiction" of the "standard" for \ to do what \ does? I don't think that is a good way to decide what TeX's input format should be. -- Matthew Skala msk...@ansuz.sooke.bc.ca People before principles. http://ansuz.sooke.bc.ca/ -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex