Re: [XeTeX]   in XeTeX

2011-11-14 Thread Andrew Moschou
We could also have an switch, when turned on displays the various
whitespaces using particular glyphs. MS Word does this and displays an
ordinary space with ·, a non breaking space with °, a tab with →, a line
break with ↲ and a paragraph break with ¶.


On 15 November 2011 09:13, Mike Maxwell  wrote:

>
> A number of alternatives to a hex editor have been pointed out:
> 1) color coding
> 2) using a font that has a representation of these code points
> 3) using any text editor that allows you to see the Unicode code point of
> a character (I use jEdit this way, I'm sure many other editors offer this
> support)
>


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Ross Moore
Hi Chris, Zdenek, and others

On 15/11/2011, at 10:09 AM, Chris Travers wrote:

> On Mon, Nov 14, 2011 at 2:43 PM, Mike Maxwell  wrote:
>> On 11/14/2011 4:56 PM, Zdenek Wagner wrote:
> 
> 
> 
>> But in fact, the last time I tried this, the NBSP character was interpreted
>> in the same way as an ASCII space, which is not what I want.  What I want
>> (repeating myself again) is for such characters to--

That is not what happened to me. (see attached image)
The space in the middle of:  abcd efg  is an  "A0  not a "20 .
I checked it inside the PDF.



link-dest-test.pdf
Description: Adobe PDF document

 have their Unicode-defined semantics, to the extent that
 makes sense in XeTeX.
>> --just the same as I would expect XeTeX (or xdvipdfmx) to correctly handle
>> the visual re-ordering behavior of U+09C7 through U+09CC, or U+093F
>> (Devanagari vowel sign I).


Hope this helps,

Ross


Ross Moore   ross.mo...@mq.edu.au 
Mathematics Department   office: E7A-419  
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia  2109  fax: +61 (0)2 9850 8114






--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Chris Travers
On Mon, Nov 14, 2011 at 2:43 PM, Mike Maxwell  wrote:
> On 11/14/2011 4:56 PM, Zdenek Wagner wrote:



> But in fact, the last time I tried this, the NBSP character was interpreted
> in the same way as an ASCII space, which is not what I want.  What I want
> (repeating myself again) is for such characters to--
>>> have their Unicode-defined semantics, to the extent that
>>> makes sense in XeTeX.
> --just the same as I would expect XeTeX (or xdvipdfmx) to correctly handle
> the visual re-ordering behavior of U+09C7 through U+09CC, or U+093F
> (Devanagari vowel sign I).

Would you be opposed to requiring an on-switch which would be required
before unicode whitespace characters acquire special meaning?  The
nice thing about an on-switch is one can comment it out for debugging
purposes.

>
>> However, I would not like to think, why I have
>> overful/underful boxes and opening hex editor to see what kind of
>> space is written between words.
>
> A number of alternatives to a hex editor have been pointed out:
> 1) color coding

Most color coding on text editors affects things other than
whitespace.  I think color coding whitespace will be visually
problematic.

> 2) using a font that has a representation of these code points

Ok, so we'd have to use a spacial font to display things *instead of whitespace*

> 3) using any text editor that allows you to see the Unicode code point of a
> character (I use jEdit this way, I'm sure many other editors offer this
> support)

But you are still hunting here.
>
> Again, this is not about _forcing_ anyone to use NBSP etc., it is about
> _allowing_ their use *with the expected Unicode behavior.*

Hence my proposal to require enabling it as an optional feature,
rather than making it the default behavior.

Best Wishes,
Chris Travers



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Mike Maxwell

On 11/14/2011 4:56 PM, Zdenek Wagner wrote:

2011/11/14 Mike Maxwell:

We are not (at least I am not) suggesting that everyone must use
the Unicode non-breaking space character, or etc.  What we *are*
suggesting is that in Xe(La)Tex, we be *allowed* to use those
characters, and that they have their


You are allowed to use them, nothing prevents you.


At least one participant in this thread (or actually the related thread 
"Whitespace in input"--the person in question is 
msk...@ansuz.sooke.bc.ca) has said:

> U+00A0 is an invalid character for TeX input

That sounds pretty much like prevention (although maybe you don't agree 
with him).


But in fact, the last time I tried this, the NBSP character was 
interpreted in the same way as an ASCII space, which is not what I want. 
 What I want (repeating myself again) is for such characters to--

>> have their Unicode-defined semantics, to the extent that
>> makes sense in XeTeX.
--just the same as I would expect XeTeX (or xdvipdfmx) to correctly 
handle the visual re-ordering behavior of U+09C7 through U+09CC, or 
U+093F (Devanagari vowel sign I).


> However, I would not like to think, why I have
> overful/underful boxes and opening hex editor to see what kind of
> space is written between words.

A number of alternatives to a hex editor have been pointed out:
1) color coding
2) using a font that has a representation of these code points
3) using any text editor that allows you to see the Unicode code point 
of a character (I use jEdit this way, I'm sure many other editors offer 
this support)


Again, this is not about _forcing_ anyone to use NBSP etc., it is about 
_allowing_ their use *with the expected Unicode behavior.*

--
Mike Maxwell
maxw...@umiacs.umd.edu
"My definition of an interesting universe is
one that has the capacity to study itself."
--Stephen Eastmond


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread Philip TAYLOR



msk...@ansuz.sooke.bc.ca wrote:




2. Inevitably, people will include invalid characters in TeX input; and
U+00A0 is an invalid character for TeX input.


Firstly (as is clear from the list on which we are discussing
this), we are not discussing TeX but XeTeX.  Secondly, even
if we were discussing TeX, on what basis do you claim that
U+00A0 is invalid ?  And if you assert that it is, /a priori/,
invalid for TeX, and if your reasons for that assertion are
sound, do they also support the assertion that it is, /a priori/,
invalid for XeTeX ?

Remainder snipped, so that we can debate one point at a time.

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Zdenek Wagner
2011/11/14 Mike Maxwell :

>
> I'm going to repeat myself, or maybe if I shout I'll be heard?
>
> We are not (at least I am not) suggesting that everyone must use the Unicode
> non-breaking space character, or etc.  What we *are* suggesting is that in
> Xe(La)Tex, we be *allowed* to use those characters, and that they have their

You are allowed to use them, nothing prevents you. I use them even in
normal 8bit LaTeX. As I wrote, I sometimes process data coming from
databases or converted from MS Word via OpenOffice where these
characters may appear.

> Unicode-defined semantics, to the extent that makes sense in XeTeX.  If
> because of your editor you prefer to use a '~' in your XeTeX files, that's
> fine, we won't stop you.
>
> If some day you decide to edit my XeLaTeX files, you're welcome to do so,
> just beware of the U+00A0 NBSP characters...not to mention the Arabic block
> characters (including the ones used for Urdu and Pashto), and the Bengali
> block characters, and the Thaana block, and Latin supplement blocks, and
> IPA, and maybe the Devanagari block characters, and...  All of which will
> show up as squares or something in your editor, if you don't have a suitable
> font; and all of which--control characters or not--*could* be represented in
> 8-bit or even 7-bit encodings, using macros or some such.  The reason for
> using XeTeX is so I don't *have* to use macros or some funky abbreviation to
> represent them.
>
If I know the language and script, I have the font. I could edit
Hindi, Sanskrit, Marathi, Nepali (all using Devanagari) and Urdu,
maybe even Arabic and Persian but I would not try to edit Malayam,
Tamil, Kannada, Telugu, Panjabi, Gujarati although they display well
in my computer. However, I would not like to think, why I have
overful/underful boxes and opening hex editor to see what kind of
space is written between words.

> Summary: if XeTeX supports Unicode, then let it support Unicode.
> --
>        Mike Maxwell
>        maxw...@umiacs.umd.edu
>        "My definition of an interesting universe is
>        one that has the capacity to study itself."
>        --Stephen Eastmond
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread Karljurgen Feuerherm
I didn't say anything about U+00A0 one way or the other

Keeping in mind that the purpose of this software is to get work done,
and not to fulfil anyone's philosophical notions of software, my general
feeling is that:

* Xe(La)TeX should support plain text characters--for *my* present
purpose, meaning characters which are printable, pure and simple,
regardless of where in the Unicode space they are; as far as I know,
this is the case now (and my case in point was more or less just aimed
at this issue);

* it should support whatever other characters are necessary to complex
rendering, if it doesn't already;

* optionally it can/could support whatever else, as the in-the-flesh
maintainers of the package have time and leisure to implement.

I said 'feel', because it seems to me all very well for the rest of us
to debate philosophy back and forth, but unless we're doing the actual
work

As someone has already pointed out, lots of what is in Unicode is there
because it is UNI-code. It may very well have outlived its usefulness,
at least in the context of Xe(La)TeX doing the work one would like it to
do. Just because something is in Unicode doesn't mean one has to want to
use it. In fact, the more unnecessary things one implements, the better
the chance of instability.

There are no doubt multiple ways to achieve this pragmatically stated
goal. I don't feel any vested interest in dictating to anyone the
preference for how to go about it.

K

>>> On Mon, Nov 14, 2011 at  2:15 PM, in message
,
 wrote:
> On Mon, 14 Nov 2011, Karljurgen Feuerherm wrote:
>> I use U+12000 and above regularly, as a case in point...
>
> Do you think that basic formatting control functions should be bound
to
> code points in that range, as the preferred way of accessing those
> functions?  Let's not lose track of what this discussion is about.
>
> XeTeX can *with appropriate font support* accept nearly any Unicode
point
> in its input.  But very few Unicode points are treated specially by
XeTeX
> as such, and I don't think U+00A0 should be one of them.
> --
> Matthew Skala
> msk...@ansuz.sooke.bc.ca People before principles.
> http://ansuz.sooke.bc.ca/
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>   http://tug.org/mailman/listinfo/xetex



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread mskala
On Mon, 14 Nov 2011, Karljurgen Feuerherm wrote:
> I use U+12000 and above regularly, as a case in point...

Do you think that basic formatting control functions should be bound to
code points in that range, as the preferred way of accessing those
functions?  Let's not lose track of what this discussion is about.

XeTeX can *with appropriate font support* accept nearly any Unicode point
in its input.  But very few Unicode points are treated specially by XeTeX
as such, and I don't think U+00A0 should be one of them.
-- 
Matthew Skala
msk...@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread Karljurgen Feuerherm
>>> On Mon, Nov 14, 2011 at 12:15 PM, in message
<4ec14cb5.7000...@rhul.ac.uk>,
Philip TAYLOR  wrote:

>> XeTeX is a TeX engine.  Obviously, it is free to define its own
input
>> format, and that format already differs from other TeX engines by
(for
>> instance) allowing some Unicode code points outside the 7-bit
range.
>
> I think (with respect) that "some Unicode code points outside the
7-bit
> range"
> is a gross understatement.  As far as I am aware, XeTeX permits a
very
> considerable
> subset of Unicode (perhaps even all of it; I do not know) as input.

I use U+12000 and above regularly, as a case in point...

K


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Mike Maxwell

On 11/14/2011 5:38 AM, Zdenek Wagner wrote:

2011/11/14 Petr Tomasek:

On Sun, Nov 13, 2011 at 06:25:08PM +0200, Tobias Schoel wrote:

Am 13.11.2011 18:16, schrieb Philip TAYLOR:
Not in every case. How would you visually differentiate between all the
white space characters (space vs. non-break space, thin space (u2009)
vs. narrow no-break space (u202f), ... ) such that the text remains
readable?


Using different color.


You live in a perfect world where you can do everything with a single
editor using nice GUI. The world is not yet that perfect.


I'm going to repeat myself, or maybe if I shout I'll be heard?

We are not (at least I am not) suggesting that everyone must use the 
Unicode non-breaking space character, or etc.  What we *are* suggesting 
is that in Xe(La)Tex, we be *allowed* to use those characters, and that 
they have their Unicode-defined semantics, to the extent that makes 
sense in XeTeX.  If because of your editor you prefer to use a '~' in 
your XeTeX files, that's fine, we won't stop you.


If some day you decide to edit my XeLaTeX files, you're welcome to do 
so, just beware of the U+00A0 NBSP characters...not to mention the 
Arabic block characters (including the ones used for Urdu and Pashto), 
and the Bengali block characters, and the Thaana block, and Latin 
supplement blocks, and IPA, and maybe the Devanagari block characters, 
and...  All of which will show up as squares or something in your 
editor, if you don't have a suitable font; and all of which--control 
characters or not--*could* be represented in 8-bit or even 7-bit 
encodings, using macros or some such.  The reason for using XeTeX is so 
I don't *have* to use macros or some funky abbreviation to represent them.


Summary: if XeTeX supports Unicode, then let it support Unicode.
--
Mike Maxwell
maxw...@umiacs.umd.edu
"My definition of an interesting universe is
one that has the capacity to study itself."
--Stephen Eastmond


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread Philip TAYLOR



msk...@ansuz.sooke.bc.ca wrote:

On Mon, 14 Nov 2011, Philip TAYLOR wrote:

I think (with respect) that "some Unicode code points outside the 7-bit range"
is a gross understatement.  As far as I am aware, XeTeX permits a very
considerable
subset of Unicode (perhaps even all of it; I do not know) as input.


My point is that it shouldn't treat U+00A0 as equivalent to U+007E, or
as valid at all, just because it supports "Unicode."  That is not what
supporting Unicode means.


I agree with your opinion that it should not
treat U+00A0 as equivalent to U+007E -- indeed,
the Unicode standard specifies as its canonical
decomposition :

 SPACE (U+0020)

However, I cannot agree that it should not be
treated as valid; that is just the thin end of
the wedge, and I would sooner there were no
wedge at all.  XeTeX's primary strength is that
it supports Unicode; we should not weaken that
strength by requiring that it supports some parts
of Unicode and not others.

My EUR 0,02.
** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread Tobias Schoel



Am 14.11.2011 18:30, schrieb msk...@ansuz.sooke.bc.ca:

1.  No.  That is not what Unicode is for.  Unicode's goal is to subsume
all reasonable pre-existing encodings.

Unicode is even more. Look at all the Annexes to Unicode 6.0

 Some reasonable pre-existing

encodings include a non-breaking space character, so Unicode includes one.
That does not mean Unicode says you should actually use it!  There are
many precedents of Unicode providing multiple ways of representing
things, as a result of including characters from other systems, without
it being reasonable to demand that all Unicode-compatible systems must
support all of them.  For instance, most of the U+FFxx range is devoted
to different kinds of hacks for handling partial-width characters in
Asian-language typesetting; the preferred way to do that nowadays is via
OpenType features, but the code points remain in the standard.  The U+
to U+001F range is basically control characters for Teletype machines;
some of those, like U+000A and U+000D, are widely used in modern documents
(but in varying ways by different systems!) and others, like U+001D, are
virtually unheard-of.  Unicode does NOT say everybody has to support them
all let alone all in the same way.
Hmm, I have difficulties exactly understanding the conformance chapter 
of Unicode 6.0 ( http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf 
), but it seems to me, that claiming unicode support seems a very strong 
statement.




The U+00A0 code points is not explicitly deprecated in Unicode, but it was
never a principle of Unicode that all implementations have to support all
defined control characters regardless of appropriateness to the particular
purpose.  "Non-breaking space" is, from TeX's point of view, not really a
character at all, but a formatting command; and TeX already has a way of
dealing with formatting commands in general and this one in particular.
It is appropriate to say that the preferred way of handling non-breaking
spaces in TeX input is the existing TeX way; and saying that in NO WAY AT
ALL contradicts anything in Unicode.  Unicode is servant, not master.

I think it's more like math being servant _and_ master of natural sciences.


2. Inevitably, people will include invalid characters in TeX input; and
U+00A0 is an invalid character for TeX input.  The best way to deal with
it is to treat it like any other invalid character and generate an error
message.  A reasonable alternative would be to say "it is whitespace; it
will be treated like other whitespace."  That would mean ignoring its
breaking/non-breaking-ness, as we have for a long time similarly ignored
the special properties of U+0009 (tab).  Of course, if users want to
define a special meaning for U+00A0 in their own input, they can do so
with the existing mechanisms for redefining the meanings of input
characters; but "U+00A0 is equivalent to U+007E (~)," for instance, should
never be the default and (because of trouble displaying it) shouldn't be
encouraged.
Now we come to the trouble of Unicode specifying a line-breaking 
algorithm ( http://www.unicode.org/reports/tr14/tr14-26.html ), which 
probably isn't exactly TeX's. I'm not into these algorithms, so I can't 
compare. But I would ask some Master of this Art to speak up about this 
conflict.




3. No.  Better to keep everything visible and backward compatible.  U+007E
(~) should remain the preferred way of doing non-breaking space.

Should and is … (see other posts).


4. Not applicable because of the answer to #3.  Users who do insist on
putting U+00A0 in their input presumably have *already* got their own
reasons to think that it's more convenient for them, including solutions
satisfactory to themselves for how to type it on keyboards and see it on
screens, so that's their business and not a problem we need to solve.

I'm personally trying hard to find a correct way. As of now, I have 
found a very simple solution to input special whitespace characters. 
(Using Linux, doing this is easy business with ibus.) Alas, I haven't 
found any editor suited better to my TeX needs than Kile, but I haven't 
yet managed to highlight these special whitespace characters properly.
=> Some experts can do all these things. That doesn't mean, everyone 
else should stick do "stupid old" ASCII-7.


bye

Toscho


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread mskala
On Mon, 14 Nov 2011, Philip TAYLOR wrote:
> I think (with respect) that "some Unicode code points outside the 7-bit range"
> is a gross understatement.  As far as I am aware, XeTeX permits a very
> considerable
> subset of Unicode (perhaps even all of it; I do not know) as input.

My point is that it shouldn't treat U+00A0 as equivalent to U+007E, or
as valid at all, just because it supports "Unicode."  That is not what
supporting Unicode means.
-- 
Matthew Skala
msk...@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread Philip TAYLOR



msk...@ansuz.sooke.bc.ca wrote:


XeTeX is a TeX engine.  Obviously, it is free to define its own input
format, and that format already differs from other TeX engines by (for
instance) allowing some Unicode code points outside the 7-bit range.


I think (with respect) that "some Unicode code points outside the 7-bit range"
is a gross understatement.  As far as I am aware, XeTeX permits a very 
considerable
subset of Unicode (perhaps even all of it; I do not know) as input.


if we were discussing TeX, on what basis do you claim that
U+00A0 is invalid ?  And if you assert that it is, /a priori/,


It's invalid if XeTeX says it is invalid, and I think XeTeX should say
it is invalid.


That is a very different statement, and as that is your
personal position, I respect it as such.  Of course,
I disagree :-)

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread mskala
On Mon, 14 Nov 2011, Philip TAYLOR wrote:
> > 2. Inevitably, people will include invalid characters in TeX input; and
> > U+00A0 is an invalid character for TeX input.
>
> Firstly (as is clear from the list on which we are discussing
> this), we are not discussing TeX but XeTeX.  Secondly, even

XeTeX is a TeX engine.  Obviously, it is free to define its own input
format, and that format already differs from other TeX engines by (for
instance) allowing some Unicode code points outside the 7-bit range.  But
I still see XeTeX as a version of TeX, not something completely different,
and it's appropriate for expectations we might have about TeX - for
instance, the expectation that formatting commands are visible and the
"non-breaking space" formatting command is ~ - to also apply to XeTeX
where they are appropriate.

> if we were discussing TeX, on what basis do you claim that
> U+00A0 is invalid ?  And if you assert that it is, /a priori/,

It's invalid if XeTeX says it is invalid, and I think XeTeX should say
it is invalid.

-- 
Matthew Skala
msk...@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread Zdenek Wagner
2011/11/14 Philip TAYLOR :
>
>
> msk...@ansuz.sooke.bc.ca wrote:
>
>  followed by>
>
>> 2. Inevitably, people will include invalid characters in TeX input; and
>> U+00A0 is an invalid character for TeX input.
>
> Firstly (as is clear from the list on which we are discussing
> this), we are not discussing TeX but XeTeX.  Secondly, even
> if we were discussing TeX, on what basis do you claim that
> U+00A0 is invalid ?  And if you assert that it is, /a priori/,
> invalid for TeX, and if your reasons for that assertion are
> sound, do they also support the assertion that it is, /a priori/,
> invalid for XeTeX ?
>
> Remainder snipped, so that we can debate one point at a time.
>
I agree with Phil there is nothing in TeX that makes a character
invalid a priori. It is made invalid by \catcode.

There are two aspects:

A. We are preparing a document to be typeset by TeX. Why on earth
should we use only U+00a0 and not ~ which is clearly visible in any
editor and has been used for a nonbreakable space for years? Why we
use & in \halign or \begin{tabular} and not U+0009?

B. TeX is used to typeset data extracted from a database (or similar
source) that was not TeX-aware at the first place. Such data can
contain not only U+00a0 but even texts as "Tweedledum & Tweedledee",
"12 $", "15 %", "#1", whatever. In such a case we must be aware that
the input may contain arbitrary characters, even those playing special
roles in TeX. We have to handle them properly.

> Philip Taylor
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Whitespace in input

2011-11-14 Thread Philip TAYLOR



msk...@ansuz.sooke.bc.ca wrote:



> 2. Inevitably, people will include invalid characters in TeX input; and
> U+00A0 is an invalid character for TeX input.

Firstly (as is clear from the list on which we are discussing
this), we are not discussing TeX but XeTeX.  Secondly, even
if we were discussing TeX, on what basis do you claim that
U+00A0 is invalid ?  And if you assert that it is, /a priori/,
invalid for TeX, and if your reasons for that assertion are
sound, do they also support the assertion that it is, /a priori/,
invalid for XeTeX ?

Remainder snipped, so that we can debate one point at a time.

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


[XeTeX] Whitespace in input

2011-11-14 Thread mskala
I think this discussion is bogging down because several different
questions are getting mixed together.  Here's what I see as the major
issues:

1. Does Unicode specify a single correct way of representing white space?

2. If an input file to XeTeX contains currently less common Unicode
whitespace code points, such as U+00A0, what should XeTeX do?

3. Should users be encouraged, or even required, to include those code
points in input to XeTeX, in order to achieve typesetting goals that in
older TeX engines were achieved by other means?

4. Since many editing environments make it inconvenient to process
currently less common Unicode whitespace code points, what should users do
if the answer to #3 is "yes"?

Now, separate from identifying what the questions are, here's what I think
are reasonable answers to the questions:

1.  No.  That is not what Unicode is for.  Unicode's goal is to subsume
all reasonable pre-existing encodings.  Some reasonable pre-existing
encodings include a non-breaking space character, so Unicode includes one.
That does not mean Unicode says you should actually use it!  There are
many precedents of Unicode providing multiple ways of representing
things, as a result of including characters from other systems, without
it being reasonable to demand that all Unicode-compatible systems must
support all of them.  For instance, most of the U+FFxx range is devoted
to different kinds of hacks for handling partial-width characters in
Asian-language typesetting; the preferred way to do that nowadays is via
OpenType features, but the code points remain in the standard.  The U+
to U+001F range is basically control characters for Teletype machines;
some of those, like U+000A and U+000D, are widely used in modern documents
(but in varying ways by different systems!) and others, like U+001D, are
virtually unheard-of.  Unicode does NOT say everybody has to support them
all let alone all in the same way.

The U+00A0 code points is not explicitly deprecated in Unicode, but it was
never a principle of Unicode that all implementations have to support all
defined control characters regardless of appropriateness to the particular
purpose.  "Non-breaking space" is, from TeX's point of view, not really a
character at all, but a formatting command; and TeX already has a way of
dealing with formatting commands in general and this one in particular.
It is appropriate to say that the preferred way of handling non-breaking
spaces in TeX input is the existing TeX way; and saying that in NO WAY AT
ALL contradicts anything in Unicode.  Unicode is servant, not master.

2. Inevitably, people will include invalid characters in TeX input; and
U+00A0 is an invalid character for TeX input.  The best way to deal with
it is to treat it like any other invalid character and generate an error
message.  A reasonable alternative would be to say "it is whitespace; it
will be treated like other whitespace."  That would mean ignoring its
breaking/non-breaking-ness, as we have for a long time similarly ignored
the special properties of U+0009 (tab).  Of course, if users want to
define a special meaning for U+00A0 in their own input, they can do so
with the existing mechanisms for redefining the meanings of input
characters; but "U+00A0 is equivalent to U+007E (~)," for instance, should
never be the default and (because of trouble displaying it) shouldn't be
encouraged.

3. No.  Better to keep everything visible and backward compatible.  U+007E
(~) should remain the preferred way of doing non-breaking space.

4. Not applicable because of the answer to #3.  Users who do insist on
putting U+00A0 in their input presumably have *already* got their own
reasons to think that it's more convenient for them, including solutions
satisfactory to themselves for how to type it on keyboards and see it on
screens, so that's their business and not a problem we need to solve.

-- 
Matthew Skala
msk...@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Zdenek Wagner
2011/11/14 Keith J. Schultz :
> Hi Zdenek,
>
> I am suggesting that one be forced to use any particular editor.
>
> But, if we want a unified/consistent editor across all platforms,

No, I need unified graphical representation accross editors. One of my
customers was Czech National Bank. Due to security users do not know
administrator password and are not allowed software at their will,
soeone in the bank after some testing decides what can be installed on
users' computers. I cannot tell them "install TeXworks" because they
are not allowed to do it.

I write not only TeX files but also Java, XML, XSLT, Perl, bash, PHP,
httpd.conf files, /etc/motd messages. I do not want to have a separate
editor for each purpose and learn how to use it. Moreover I often use
a text editor over ssh where GUI is not available. Color usually is
but need not be. I need one good editor that can be used in all such
cases and I must be able to use whatever editor other people have if
it is necesary to edit the file immediately on someone else's
computer.

If nice features are added to a particular editor, it is of course
good but the TeX source file readability must not be bound to a
particular editor with particular features.

> I would consider TeXWorks as a viable candidate as it is already cross 
> platform.
> It should be easy enough to add a feature that could make the different forms 
> of
> white space visible.
>
> I do not use TeXworks so I can not say if it works via telnet or ssh.
>
> Personally, I think when working with unicode you should use a graphics
> capable terminal. But, that is just my position.
>
> regards
>        Keith.
>
> Am 14.11.2011 um 15:16 schrieb Zdenek Wagner:
>
>> 2011/11/14 Keith J. Schultz :
>>> Well, Zdenek,
>>>
>>> I guess that is where TeXWorks comes to mind. It could give a unified
>>> GUI for TeX with unicode.
>>>
>> Does it mean I will be forced to use TeXWorks and nothing else? And
>> will it work over telnet or ssh without graphics? I have other unicode
>> capable editors if proper fonts are installed but none of them
>> displays nonbreakable space in a way clearly distinguishable from
>> normal space.
>
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Hi Zdenek,

I am suggesting that one be forced to use any particular editor.

But, if we want a unified/consistent editor across all platforms,
I would consider TeXWorks as a viable candidate as it is already cross platform.
It should be easy enough to add a feature that could make the different forms of
white space visible. 

I do not use TeXworks so I can not say if it works via telnet or ssh. 

Personally, I think when working with unicode you should use a graphics
capable terminal. But, that is just my position.

regards
Keith.

Am 14.11.2011 um 15:16 schrieb Zdenek Wagner:

> 2011/11/14 Keith J. Schultz :
>> Well, Zdenek,
>> 
>> I guess that is where TeXWorks comes to mind. It could give a unified
>> GUI for TeX with unicode.
>> 
> Does it mean I will be forced to use TeXWorks and nothing else? And
> will it work over telnet or ssh without graphics? I have other unicode
> capable editors if proper fonts are installed but none of them
> displays nonbreakable space in a way clearly distinguishable from
> normal space.




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Hi Herbert,

You are absolutely right in your assessment. True plain text files are/where 
traditionally 7-bits.

Though, I have to tell you that nowadays even 8-bit files are considered plain 
text.

The verdict is still out in how far unicode text files are plain text files, as 
unicode is well unicode and
its encoding goes a little further than what be considered as "plain text". 
Yet, if you considered unicode just as
a encoding of text, then unicode is plain text.

On the other side in computer science there is, as you said only bits and 
bytes. It is how we interpret them that
makes it text, or plain text, or binary code. 

regards
Keith. 
Am 14.11.2011 um 14:48 schrieb Herbert Schulz:

> 
> Howdy,
> 
> Gosh, I hate to get into the middle of this but here's my interpretation of 
> what a plain text file is and why.
> 
> All files are, in fact, just a series of bytes (or even bits) and how these 
> bytes are to be interpreted determine if the file is a plain text file or 
> not. Traditional TeX used the 7-bit ASCII set of bytes. Most extensions of 
> that set have those same byte values representing the same characters so 
> 7-bit ASCII is usually a sub-set of those extensions (also known as 
> encodings). A plain text file uses only the common 7-bit ASCII byte set and 
> virtually any application that can read that file interprets the meanings of 
> the bytes correctly. The moment you use an extension of that 7-bit ASCII set 
> an additional piece of information must be given; which encoding is being 
> used. (There are some heuristics for determining this on the fly but none are 
> 100% accurate.) Because that extra information must be given before an 
> application can display the meaning of the file (i.e., replace the bytes by 
> the characters) I don't consider those files as being plain text. Maybe text 
> because the int!
 er!
> pretation of the bytes is characters of some sort but not plain text.
> 
> Notice that how those characters are interpreted by other applications has 
> nothing to do with whether the file is plain text or other text. A Text 
> Editor interprets the bytes simply as characters and displays them in some 
> way while pdflatex interprets bytes strings as combinations of commands and 
> text; same file, different interpretations.
> 
> This is as far as I'm going in this since I really want to stay out of the 
> argument. It's just my 0.0001 cents.
> 
> Good Luck,
> 
> Herb Schulz
> (herbs at wideopenwest dot com)
> 
> 
> 
> 
> 
> 
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Zdenek Wagner
2011/11/14 Keith J. Schultz :
> Well, Zdenek,
>
> I guess that is where TeXWorks comes to mind. It could give a unified
> GUI for TeX with unicode.
>
Does it mean I will be forced to use TeXWorks and nothing else? And
will it work over telnet or ssh without graphics? I have other unicode
capable editors if proper fonts are installed but none of them
displays nonbreakable space in a way clearly distinguishable from
normal space.

> regards
>        Keith.
>
> Am 14.11.2011 um 11:38 schrieb Zdenek Wagner:
>
>> You live in a perfect world where you can do everything with a single
>> editor using nice GUI. The world is not yet that perfect. How do I use
>> color when aditing a file using ssh and colorless terminal? What is
>> the Unicode standard color of NBSP? I do not edit files just on my
>> computer, I have to support customers, I have to cooperate with
>> colleagues. they use different platforms, different editors. If they
>> all use TeX, I know that ~ denotes nonbreakable space. What is a
>> world-wide platform independent and color independent visible
>> representation of a nonbreakable space that is clearly distinct from a
>> normal space?
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Herbert Schulz

On Nov 14, 2011, at 7:11 AM, Philip TAYLOR wrote:

> 
> 
> Karljurgen Feuerherm wrote:
> 
>> It depends on who is reading them. Their markup is markup only fron the
>> point of view of their interpreters, i.e. *TeX, etc. From the point of
>> view of something else, they are plain.
> 
> Yes, the Universe of Discourse (and/or the pragmatics
> of discourse) do have a input here : but then, from
> your perspective, what text file is /not/ a plain
> text file "from the point of view of something else" ?
> 
> ** Phil.


Howdy,

Gosh, I hate to get into the middle of this but here's my interpretation of 
what a plain text file is and why.

All files are, in fact, just a series of bytes (or even bits) and how these 
bytes are to be interpreted determine if the file is a plain text file or not. 
Traditional TeX used the 7-bit ASCII set of bytes. Most extensions of that set 
have those same byte values representing the same characters so 7-bit ASCII is 
usually a sub-set of those extensions (also known as encodings). A plain text 
file uses only the common 7-bit ASCII byte set and virtually any application 
that can read that file interprets the meanings of the bytes correctly. The 
moment you use an extension of that 7-bit ASCII set an additional piece of 
information must be given; which encoding is being used. (There are some 
heuristics for determining this on the fly but none are 100% accurate.) Because 
that extra information must be given before an application can display the 
meaning of the file (i.e., replace the bytes by the characters) I don't 
consider those files as being plain text. Maybe text because the inter!
 pretation of the bytes is characters of some sort but not plain text.

Notice that how those characters are interpreted by other applications has 
nothing to do with whether the file is plain text or other text. A Text Editor 
interprets the bytes simply as characters and displays them in some way while 
pdflatex interprets bytes strings as combinations of commands and text; same 
file, different interpretations.

This is as far as I'm going in this since I really want to stay out of the 
argument. It's just my 0.0001 cents.

Good Luck,

Herb Schulz
(herbs at wideopenwest dot com)






--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Ulrike Fischer
Am Mon, 14 Nov 2011 04:05:58 -0800 schrieb Chris Travers:

> I think one of the key strengths of TeX is that it can be edited
> gracefully by ANY basic text editor.  I would hate for that to be
> lost.

Well already pdflatex can handle utf8-documents which contains cjk
or greek which are quite difficult to edit with a basic (non-utf8)
editor.

And here you are on the *xetex* list - an engine which explicitly is
meant to offer unicode support. 

Also even if you restrict yourself to pure ascii input the problem
with the various spaces doesn't disappear. Unicode chars can be
input through the ^^-notation. Eg. you can input with xetex the
non-breakable space U+00A0 like this (and the example shows that the
standard latin modern font handles the space correctly):

\documentclass{article}
\usepackage{fontspec}
\textwidth=3cm
\begin{document}
ab^^a0cd^^a0cd^^a0cd^^a0cd^^a0cd^^a0cd^^a0cd^^a0cd^^a0

ab cd cd cd cd cd cd cd cd
\end{document}
  
-- 
Ulrike Fischer 



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Philip TAYLOR



Karljurgen Feuerherm wrote:


It depends on who is reading them. Their markup is markup only fron the
point of view of their interpreters, i.e. *TeX, etc. From the point of
view of something else, they are plain.


Yes, the Universe of Discourse (and/or the pragmatics
of discourse) do have a input here : but then, from
your perspective, what text file is /not/ a plain
text file "from the point of view of something else" ?

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Philip TAYLOR





When you are willing to come back to a serious discussion we talk.


I am participating in a serious discussion, Keith,
but I am more than happy to ignore your own inane
babble if it will make you any happier.

Philip Taylor

Keith J Schultz wrote :


Hi Humpty Dumpty,

Go read the standards and cry without kissing the girls.
Evidently, you are  trained in computer science or you would
know what a real plain text file is.

Also, in computer science we do not use the definitions of lay persons nor
common language use.

I assume you know all about academia and the use of language.
Or that the language of law for example is quite different that "normal"
langauge.

When you are willing to come back to a serious discussion we talk.
But, troll if you wish.

regards
Keith.



--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Karljurgen Feuerherm
>> Now, for the youngsters XML, TeX, HTML are per definition plain text
files.
>
> No, they are text files, not /plain/ text files.  Look
> at some mime types :
>
>   text/plain (for plain text)
>   text/html (for HTML)

It depends on who is reading them. Their markup is markup only fron the
point of view of their interpreters, i.e. *TeX, etc. From the point of
view of something else, they are plain.


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Hi Humpty Dumpty,

Go read the standards and cry without kissing the girls. 
Evidently, you are  trained in computer science or you would
know what a real plain text file is. 

Also, in computer science we do not use the definitions of lay persons nor
common language use. 

I assume you know all about academia and the use of language. 
Or that the language of law for example is quite different that "normal"
langauge.

When you are willing to come back to a serious discussion we talk.
But, troll if you wish.

regards
Keith.

Am 14.11.2011 um 12:08 schrieb Philip TAYLOR:

> Humpty Dumpty might have approved ("When I used a word,"
> Humpty Dumpty said in rather a scornful tone, "it means
> just what I choose it to mean -- neither more nor less.")
> 
> but I am afraid I cannot.  The definition is /your/ definition,
> not the definition of the general community.  Plain text is
> plain text, as I wrote long ago in this thread -- it contains
> letters, digits, punctuation, special symbols, white space
> and ends of line.  By definition (the generally accepted
> definition, that is, not a personal idiosyncratic one), none
> of those letters, digits, punctuation, special symbols,
> white space or ends of line have any special significance,
> and certainly no greater significance than they would
> have were they to appear (say) printed on a sheet of paper.
> 
> As soon as you define any one of those things to have special
> significance (as do Runoff, GML, SGML, HTML, XML, TeX, ...),
> the document ceases to be plain text and becomes structured
> text.



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread mskala
On Mon, 14 Nov 2011, Petr Tomasek wrote:
> > Not in every case. How would you visually differentiate between all the
> > white space characters (space vs. non-break space, thin space (u2009)

> Using different color.

About 8% of men have some form of colour blindness (the prevalance is much
lower, but still nontrivial, in women), and it's a basic rule of
interface design that although colour is valuable as an adjunct to other
ways of presenting information, important information must never be
conveyed *only* by colour.
-- 
Matthew Skala
msk...@ansuz.sooke.bc.ca People before principles.
http://ansuz.sooke.bc.ca/


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Chris Travers
On Mon, Nov 14, 2011 at 4:47 AM, Keith J. Schultz  wrote:
> Hi Chris,
>
> I agree with you that one should be able to see the differences in an editor,
> but this feature should be feature to turn off and on.

Absolutely.  If it requires an on switch to take effect, I have no complaints.
>
> The question is what is an ordinary editor.
>
> Also, most prefer to use their pet editors.
>
Ordinary editors should include not only VIM and EMACS (in a very
ordinary mode, and without custom scripting) but also things like
gedit, etc.

In other words, I am not comfortable with the idea of a feature which
is on by default, and requires color highlighting of whitespace in an
editor to debug.  If it is off by default, then when you see the on
switch, at least you know where problems might be.

Best Wishes,
Chris Travers


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Hi Chris,

I agree with you that one should be able to see the differences in an editor,
but this feature should be feature to turn off and on.

The question is what is an ordinary editor.

Also, most prefer to use their pet editors. 

regards
Keith.

> I get worried when reserved characters are not visually differentiated
> in an ordinary text editor from non-reserved ones.
> 
> I think it's far better if one can have packages which enable or
> disable these specific characters for those who want them.  However,
> don't make me open a hex editor to see why a space is breaking or not.



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Well, Zdenek, 

I guess that is where TeXWorks comes to mind. It could give a unified
GUI for TeX with unicode.

regards
Keith.

Am 14.11.2011 um 11:38 schrieb Zdenek Wagner:

> You live in a perfect world where you can do everything with a single
> editor using nice GUI. The world is not yet that perfect. How do I use
> color when aditing a file using ssh and colorless terminal? What is
> the Unicode standard color of NBSP? I do not edit files just on my
> computer, I have to support customers, I have to cooperate with
> colleagues. they use different platforms, different editors. If they
> all use TeX, I know that ~ denotes nonbreakable space. What is a
> world-wide platform independent and color independent visible
> representation of a nonbreakable space that is clearly distinct from a
> normal space?



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Well, XeTeX users are already restricted in their choice of editors. The 
must/should support
minimalistically unicode. Of course you can enter the characters/glyphs in a 
cryptic manner.
Have fun reading a text with true unicode!

Also, remember when you had to use ALT-XXX for entering characters your 
keyboard did not
have in WORD! I know/knew quite a few of windows users that envy me entering 
mixed language
texts on my Apples!

regards
Keith.

Am 14.11.2011 um 11:27 schrieb Chris Travers:

> On Mon, Nov 14, 2011 at 2:24 AM, Petr Tomasek  wrote:
> 
>> Using different color.
>> 
> Do we really want to tie XeTeX users to a small number of editors?
> 
> Chris Travers
> 
> 
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Hi there,


Am 14.11.2011 um 11:20 schrieb Chris Travers:

> My $0.02
> 
> In general, I think we are going to get the most mileage by sticking
> with the TeX way of doing things by default.  The nice thing is that ~
> can be turned into a non-active character, and one can set other
> things if they want.  For the record, I think that having non-breaking
> spaces in a plain text document is a bad idea.  I mean, you have
> essentially invisible control characters.  What could possibly go
> wrong?  Hey, it could be worse.  I've seen programs that use "magic
> comments."
> 
> As long as one can make other characters active instead, I see no
> reason to worry about this.
> 
> But the point is that when I am debugging a TeX file I want:
> 1)  To be able to use an editor of my choice and
> 2)  To be able to see clearly what is going on.
> 
> The fact that Python, for example treats whitespace as semantically
> meaningful and hence treats tabs and spaces as semantically different
> is a big strike against that language, for example, from a semantic
> clarity perspective despite the fact that this was ironically a
> decision that was made in order to support semantic clarity.
Well, a Tab an several spaces are semantically different!
How much space is a tab character give you!
> 
> TeX files are never simple plain text files, and I don't think we
> should pretend that they are.
Well, it depends if you mean "plain text" file or "plain text file".
What does the definition/standard of TeX says it takes as input.
Things where a lot easier when TeX came to life and the definition
of a plain text file, also!

regards
Keith.



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Philip TAYLOR



Chris Travers wrote:


But what's the point of putting non-breaking spaces between a word and
the end of a line?  or for that matter what if I alternate spaces and
special unicode spaces? Do I get a word space for each of them?


In (e.g.,) HTML, it is by no means unusual to interweave
spaces and  s.    just before end of line is
less common, but I am sure someone has a use for it.


I think one of the key strengths of TeX is that it can be edited
gracefully by ANY basic text editor.  I would hate for that to be
lost.


Indeed, that was indeed one of the great strengths of TeX 2.
Sadly DEK lost the plot in 1999 and allowed those nasty
Europeans to start using /accented characters/ (spit,
vomit) in TeX 3, at which point half the editors in the
world were made useless overnight.  Now JK has gone even
further down the same road and is actually allowing /Unicode/,
so by now perhaps 90% of the existing "basic text editors" are
useless. So we may as well disenfranchise the remaining 9.9%
and allow invisible markup as well.

:-)

** Phil.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Hi Zdenek, all,

I was to lazy to list all those encodings.

I will be more precise know for those not reading carefully.

There is a difference between what is considered plain text in the 
computer
world and what its content is.

Basically, plain text is just that text no matter what its content is. 
That is in the
computer world you can have PLAIN TEXT FILES and have content that is
TeX, et all, HTML, XML, SGML and for most programming languages.
That is the source for these languages are in a more or less human 
readable format
--- TEXT.

Whether, the due to the syntax of the language represented a character 
can be used
directly or an "Escape" character to represent them is irrelevant. 
Naturally, you have to
know the syntax to understand the text. Still it is plain text 
according to the standards that
define the content of the files.

As I said I guess Unicode should be considered as plain text. Yet, 
unicode is special
in the same way as ASCII and Extended ASCII, 7-bit ASCII and 8-bit 
ASCII depending
on the fonts you used you got different results when output on the 
screen or printer.

The problem there are not any fonts that implement the FULL unicode 
set. What is what is needed.

On the other side, until we have OSes that truly fully support unicode, 
unicode can be truly considered to be 
plain text. 

As far a TeX is concerned it was not designed to handle unicode or even 
8-bit. It has been though fragmented to
handle them. It has come at a cost. It would be time to redesign it. 
refractor if you will.

regards
Keith.
  
Am 14.11.2011 um 11:07 schrieb Zdenek Wagner:

> It's not the encoding that determines whether it is a plain text.
> Texts in ISO 8859-1, CP852, UTF-8, UTF-16, BIG-5 can be plain texts.
> LTR/RTL is no problem in modern editors, I can easily combine
> Czech/English/Hindi/Urdu (uses arabic script) in a single document,
> the languages/scripts may even be mixed within a paragraph. What
> determines whether it is or is not a plain text is the presence or
> absence of control characters or commands no matter whether the file
> can be viewed and/or edited in a plain text editor such as vim or
> notepad. If I type < I wish it to mean "less that" but in XML it marks
> the element tag, If I need such a character in XML or SGML, I have to
> write < no matter what editor I use. If it were plain text, <
> would mean ampersand followed by the letters lt and a semicolon. If I
> type & in a plain text, it means "and". If I type it in a TeX file, it
> is a special character for \halign (unless \catcode is changed), in
> XML and SGML it means that all following characters up to the first
> semicolon is an entity name. If I have to insert an ampersand, I have
> to write \& in TeX or & in XML and SGML. There are different
> methods how to enter A, eg ^^41 in TeX or A in XML and SGML. As
> Phil wrote, there is a clearly defined MIME type for a plain text.
> 



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Chris Travers
On Mon, Nov 14, 2011 at 3:26 AM, Philip TAYLOR  wrote:
>
>
> Chris Travers wrote:
>
>> Ok, so why don't we have a similar macro here?  Something like:
>> \obeynbsps
>
> See above : there are /some/ things that TeX does that
> transcend category codes (which are the basis for \obeylines);
> in particular [1] :
>
> "$$ TeX deletes any  characters (number 32)" that occur
>  at the right end of an input line"
>
> These are ASCII 32s, not \catcode 10s, and XeTeX itself would
> require modification if you also wanted XeTeX to delete any
> U+2009s, U+202fs, ... in the same way that it deletes normal
> spaces.  This is without the aegis of \obeynbsp, which would
> kick in long after this action has been irrevocably completed.
>
But what's the point of putting non-breaking spaces between a word and
the end of a line?  or for that matter what if I alternate spaces and
special unicode spaces? Do I get a word space for each of them?

I think one of the key strengths of TeX is that it can be edited
gracefully by ANY basic text editor.  I would hate for that to be
lost.

Best Wishes,
Chris Travers



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Philip TAYLOR



Chris Travers wrote:


Ok, so why don't we have a similar macro here?  Something like:
\obeynbsps


See above : there are /some/ things that TeX does that
transcend category codes (which are the basis for \obeylines);
in particular [1] :

"$$ TeX deletes any  characters (number 32)" that occur
 at the right end of an input line"

These are ASCII 32s, not \catcode 10s, and XeTeX itself would
require modification if you also wanted XeTeX to delete any
U+2009s, U+202fs, ... in the same way that it deletes normal
spaces.  This is without the aegis of \obeynbsp, which would
kick in long after this action has been irrevocably completed.

But in general, \obeyspecialUnicodespaces /might/ be viable !

** Phil.

[1] "The TeXbook", page 46.  Pages 46 and 47 are well worth
studying in the context of this discussion.


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Hi Peter,

Simple answer No do not use the emacs editor, hate it!

I have not look at emacs in a very long time, but I assume
that it does not understand unicode, along with other text encodings.

But, you can edit TeX, HTML, and XML with it!

Please see my responses to Phillip and Zdenek for more insight. 

regards
Keith.

Am 14.11.2011 um 11:10 schrieb Peter Dyballa:

> 
> Am 14.11.2011 um 09:21 schrieb Keith J. Schultz:
> 
>> So, Unicode needs an editor to be displayed correctly.
> 
> Use GNU Emacs!
> 



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Philip TAYLOR



Keith J. Schultz wrote:

Hi Phillip,


Am 14.11.2011 um 09:36 schrieb Philip TAYLOR:




Keith J. Schultz wrote:


So, Unicode needs an editor to be displayed correctly.


Why ?  Not meant to sound aggressive, but seems a very
odd assertion, IMHO. Editors are for changing things;
why would you need a program intended to change things
just to display Unicode ?

Yes, there are other programs for displaying texts. I was thinking about
a unix command such as cat, less, more, etc. Depending on a few things
they will not necessarily display a Unicode text file correctly !



Now, for the youngsters XML, TeX, HTML are per definition plain text files.


No, they are text files, not /plain/ text files.  Look
at some mime types :

text/plain (for plain text)
text/html (for HTML)


C'mon Phillip! I wrote "per defintion" ! That is the file is plain text.
the plain text "text/html" is for the browser so that it knows a file 
contains
html-tags/commands and interpret accordingly during display!


Humpty Dumpty might have approved ("When I used a word,"
Humpty Dumpty said in rather a scornful tone, "it means
just what I choose it to mean -- neither more nor less.")

but I am afraid I cannot.  The definition is /your/ definition,
not the definition of the general community.  Plain text is
plain text, as I wrote long ago in this thread -- it contains
letters, digits, punctuation, special symbols, white space
and ends of line.  By definition (the generally accepted
definition, that is, not a personal idiosyncratic one), none
of those letters, digits, punctuation, special symbols,
white space or ends of line have any special significance,
and certainly no greater significance than they would
have were they to appear (say) printed on a sheet of paper.

As soon as you define any one of those things to have special
significance (as do Runoff, GML, SGML, HTML, XML, TeX, ...),
the document ceases to be plain text and becomes structured
text.

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Chris Travers
On Mon, Nov 14, 2011 at 2:56 AM, Philip TAYLOR  wrote:
>
>
> Chris Travers wrote:
>>
>> One other thought occurs to me.
>>
>> Typically in a TeX document, whitespace is not semantic.  In other
>> words, spaces, tabs, and carriage returns are not differentiated.  If
>> we are so keen on supporting a few special whitespace characters, why
>> not also support tabs and make carriage returns, you know, actually
>> break lines?
>
> It is necessary to be a /little/ careful here,
> in that a certain degree of LWSP processing takes
> place /before/ things reach TeX's mouth.  And, of
> course, \obeylines exists in order to provide a
> simple way for the user to handle ends-of-line
> specially.
>
Ok, so why don't we have a similar macro here?  Something like:
\obeynbsps

That would be fine with me.

Best Wishes,
Chris Travers



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Chris Travers
On Mon, Nov 14, 2011 at 2:54 AM, Peter Dyballa  wrote:
>
> Am 14.11.2011 um 11:16 schrieb Zdenek Wagner:
>
>> Does it display Devanagari, Arabic, Tibetan, Hebrew correctly?
>
> LTR can be improved (it's maintained by a guy who probably, judging by his 
> name, can write and read Hebrew), shaping is handled by libotf and libm17n. 
> It can also be improved. But the important thing is: it can display NO-BREAK 
> SPACE and distinguish it from SPACE.
>
Some of us don't really like to use EMACS.  We find it clunky,
wasteful with system resources, and relatively hard to learn in
comparison to VIM.

Best Wishes,
Chris Travers



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Hi Phillip,


Am 14.11.2011 um 09:36 schrieb Philip TAYLOR:

> 
> 
> Keith J. Schultz wrote:
> 
>> So, Unicode needs an editor to be displayed correctly.
> 
> Why ?  Not meant to sound aggressive, but seems a very
> odd assertion, IMHO. Editors are for changing things;
> why would you need a program intended to change things
> just to display Unicode ?
Yes, there are other programs for displaying texts. I was thinking about
a unix command such as cat, less, more, etc. Depending on a few things
they will not necessarily display a Unicode text file correctly !
> 
>> Now, for the youngsters XML, TeX, HTML are per definition plain text files.
> 
> No, they are text files, not /plain/ text files.  Look
> at some mime types :
> 
>   text/plain (for plain text)
>   text/html (for HTML)

C'mon Phillip! I wrote "per defintion" ! That is the file is plain text.
the plain text "text/html" is for the browser so that it knows a file 
contains
html-tags/commands and interpret accordingly during display!

Just like in XML the data tag can contain binary data, yet it is 
entered in HEX!
Though, I believe in the newer standards binary can be entered 
directly! Been
a long time since I look at the actual standard. 

Also, for most programming languages the source is a plain text file, 
even though its
content is in a programming language.

regards
Keith.
 




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Philip TAYLOR



Chris Travers wrote:

One other thought occurs to me.

Typically in a TeX document, whitespace is not semantic.  In other
words, spaces, tabs, and carriage returns are not differentiated.  If
we are so keen on supporting a few special whitespace characters, why
not also support tabs and make carriage returns, you know, actually
break lines?


It is necessary to be a /little/ careful here,
in that a certain degree of LWSP processing takes
place /before/ things reach TeX's mouth.  And, of
course, \obeylines exists in order to provide a
simple way for the user to handle ends-of-line
specially.

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Peter Dyballa

Am 14.11.2011 um 11:16 schrieb Zdenek Wagner:

> Does it display Devanagari, Arabic, Tibetan, Hebrew correctly?

LTR can be improved (it's maintained by a guy who probably, judging by his 
name, can write and read Hebrew), shaping is handled by libotf and libm17n. It 
can also be improved. But the important thing is: it can display NO-BREAK SPACE 
and distinguish it from SPACE.

--
Greetings

  Pete

The human brain operates at only 10% of its capacity. The rest is overhead for 
the operating system.




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Chris Travers
One other thought occurs to me.

Typically in a TeX document, whitespace is not semantic.  In other
words, spaces, tabs, and carriage returns are not differentiated.  If
we are so keen on supporting a few special whitespace characters, why
not also support tabs and make carriage returns, you know, actually
break lines?

Best Wishes,
Chris Travers


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Chris Travers
On Mon, Nov 14, 2011 at 2:35 AM, Philip TAYLOR  wrote:
>
>
> Chris Travers wrote:
>>
>> On Mon, Nov 14, 2011 at 2:24 AM, Petr Tomasek  wrote:
>>
>>> Using different color.
>>>
>> Do we really want to tie XeTeX users to a small number of editors?
>
> No.  But nor do we want to preclude the possibility of
> someone taking UTF-8 containing these "magic" characters
> from somewhere and pasting it into a XeTeX document.
> So it is essential that we support these magic characters
> in some way.  In the long term, enhancements to the
> XeTeX engine could achieve this in a far more elegant
> way by (for example) extending the number of \catcode s
> to 256 (or 65536, or 4294967296, or 2&) to
> accommodate the extended semantics that Unicode encapsulates.
> But in the short term, all that is necessary is to make
> these characters active, and to provide definitions (compatible
> with all major dialects of TeX) that implement their
> semantics.  They are then no more troublesome than
> any of the sixteen or so currently reserved characters
> when it comes to transput : that is why \unexpanded exists.
>
I get worried when reserved characters are not visually differentiated
in an ordinary text editor from non-reserved ones.

I think it's far better if one can have packages which enable or
disable these specific characters for those who want them.  However,
don't make me open a hex editor to see why a space is breaking or not.

Best Wishes,
Chris Travers



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Zdenek Wagner
2011/11/14 Petr Tomasek :
> On Sun, Nov 13, 2011 at 06:25:08PM +0200, Tobias Schoel wrote:
>>
>>
>> Am 13.11.2011 18:16, schrieb Philip TAYLOR:
>> >
>> >
>> >Tobias Schoel wrote:
>> >
>> >>One opinion says, that using (La)TeX is programming. Consequently, each
>> >>character used should be visually well distinguishable. This is not the
>> >>case with all the Unicode white space characters.
>> >
>> >Is that not a function of the editor used ? Is it not valid
>> >for an editor to display different Unicode spaces differently,
>> >such that the user can visually differentiate between them ?
>> >
>> >Philip Taylor
>>
>> Not in every case. How would you visually differentiate between all the
>> white space characters (space vs. non-break space, thin space (u2009)
>> vs. narrow no-break space (u202f), ... ) such that the text remains
>> readable?
>>
>> Toscho
>
> Using different color.
>
You live in a perfect world where you can do everything with a single
editor using nice GUI. The world is not yet that perfect. How do I use
color when aditing a file using ssh and colorless terminal? What is
the Unicode standard color of NBSP? I do not edit files just on my
computer, I have to support customers, I have to cooperate with
colleagues. they use different platforms, different editors. If they
all use TeX, I know that ~ denotes nonbreakable space. What is a
world-wide platform independent and color independent visible
representation of a nonbreakable space that is clearly distinct from a
normal space?

> --
> Petr Tomasek 
> Jabber: but...@jabbim.cz
>
> 
> EA 355:001  DU DU DU DU
> EA 355:002  TU TU TU TU
> EA 355:003  NU NU NU NU NU NU NU
> EA 355:004  NA NA NA NA NA
> 
>
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Philip TAYLOR



Chris Travers wrote:

On Mon, Nov 14, 2011 at 2:24 AM, Petr Tomasek  wrote:


Using different color.


Do we really want to tie XeTeX users to a small number of editors?


No.  But nor do we want to preclude the possibility of
someone taking UTF-8 containing these "magic" characters
from somewhere and pasting it into a XeTeX document.
So it is essential that we support these magic characters
in some way.  In the long term, enhancements to the
XeTeX engine could achieve this in a far more elegant
way by (for example) extending the number of \catcode s
to 256 (or 65536, or 4294967296, or 2&) to
accommodate the extended semantics that Unicode encapsulates.
But in the short term, all that is necessary is to make
these characters active, and to provide definitions (compatible
with all major dialects of TeX) that implement their
semantics.  They are then no more troublesome than
any of the sixteen or so currently reserved characters
when it comes to transput : that is why \unexpanded exists.

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Chris Travers
On Mon, Nov 14, 2011 at 2:24 AM, Petr Tomasek  wrote:

> Using different color.
>
Do we really want to tie XeTeX users to a small number of editors?

Chris Travers


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Petr Tomasek
On Sun, Nov 13, 2011 at 06:25:08PM +0200, Tobias Schoel wrote:
> 
> 
> Am 13.11.2011 18:16, schrieb Philip TAYLOR:
> >
> >
> >Tobias Schoel wrote:
> >
> >>One opinion says, that using (La)TeX is programming. Consequently, each
> >>character used should be visually well distinguishable. This is not the
> >>case with all the Unicode white space characters.
> >
> >Is that not a function of the editor used ? Is it not valid
> >for an editor to display different Unicode spaces differently,
> >such that the user can visually differentiate between them ?
> >
> >Philip Taylor
> 
> Not in every case. How would you visually differentiate between all the 
> white space characters (space vs. non-break space, thin space (u2009) 
> vs. narrow no-break space (u202f), … ) such that the text remains 
> readable?
> 
> Toscho

Using different color.

-- 
Petr Tomasek 
Jabber: but...@jabbim.cz


EA 355:001  DU DU DU DU
EA 355:002  TU TU TU TU
EA 355:003  NU NU NU NU NU NU NU
EA 355:004  NA NA NA NA NA





--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Chris Travers
My $0.02

In general, I think we are going to get the most mileage by sticking
with the TeX way of doing things by default.  The nice thing is that ~
can be turned into a non-active character, and one can set other
things if they want.  For the record, I think that having non-breaking
spaces in a plain text document is a bad idea.  I mean, you have
essentially invisible control characters.  What could possibly go
wrong?  Hey, it could be worse.  I've seen programs that use "magic
comments."

As long as one can make other characters active instead, I see no
reason to worry about this.

But the point is that when I am debugging a TeX file I want:
1)  To be able to use an editor of my choice and
2)  To be able to see clearly what is going on.

The fact that Python, for example treats whitespace as semantically
meaningful and hence treats tabs and spaces as semantically different
is a big strike against that language, for example, from a semantic
clarity perspective despite the fact that this was ironically a
decision that was made in order to support semantic clarity.

TeX files are never simple plain text files, and I don't think we
should pretend that they are.

Best Wishes,
Chris Travers


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Zdenek Wagner
2011/11/14 Peter Dyballa :
>
> Am 14.11.2011 um 09:21 schrieb Keith J. Schultz:
>
>> So, Unicode needs an editor to be displayed correctly.
>
> Use GNU Emacs!
>
Does it display Devanagari, Arabic, Tibetan, Hebrew correctly?

> --
> Greetings
>
>  Pete
>
> Hard Disk, n.:
>        A device that allows users to delete vast quantities of data with 
> simple mnemonic commands.
>
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Peter Dyballa

Am 14.11.2011 um 09:21 schrieb Keith J. Schultz:

> So, Unicode needs an editor to be displayed correctly.

Use GNU Emacs!

--
Greetings

  Pete

Hard Disk, n.:
A device that allows users to delete vast quantities of data with 
simple mnemonic commands.




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Zdenek Wagner
2011/11/14 Philip TAYLOR :
>
>
> Keith J. Schultz wrote:
>
>> So, Unicode needs an editor to be displayed correctly.
>
> Why ?  Not meant to sound aggressive, but seems a very
> odd assertion, IMHO. Editors are for changing things;
> why would you need a program intended to change things
> just to display Unicode ?
>
>> Now, for the youngsters XML, TeX, HTML are per definition plain text
>> files.
>
> No, they are text files, not /plain/ text files.  Look
> at some mime types :
>
>        text/plain (for plain text)
>        text/html (for HTML)
>
It's not the encoding that determines whether it is a plain text.
Texts in ISO 8859-1, CP852, UTF-8, UTF-16, BIG-5 can be plain texts.
LTR/RTL is no problem in modern editors, I can easily combine
Czech/English/Hindi/Urdu (uses arabic script) in a single document,
the languages/scripts may even be mixed within a paragraph. What
determines whether it is or is not a plain text is the presence or
absence of control characters or commands no matter whether the file
can be viewed and/or edited in a plain text editor such as vim or
notepad. If I type < I wish it to mean "less that" but in XML it marks
the element tag, If I need such a character in XML or SGML, I have to
write < no matter what editor I use. If it were plain text, <
would mean ampersand followed by the letters lt and a semicolon. If I
type & in a plain text, it means "and". If I type it in a TeX file, it
is a special character for \halign (unless \catcode is changed), in
XML and SGML it means that all following characters up to the first
semicolon is an entity name. If I have to insert an ampersand, I have
to write \& in TeX or & in XML and SGML. There are different
methods how to enter A, eg ^^41 in TeX or A in XML and SGML. As
Phil wrote, there is a clearly defined MIME type for a plain text.

> Philip Taylor
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Philip TAYLOR



Keith J. Schultz wrote:


So, Unicode needs an editor to be displayed correctly.


Why ?  Not meant to sound aggressive, but seems a very
odd assertion, IMHO. Editors are for changing things;
why would you need a program intended to change things
just to display Unicode ?


Now, for the youngsters XML, TeX, HTML are per definition plain text files.


No, they are text files, not /plain/ text files.  Look
at some mime types :

text/plain (for plain text)
text/html (for HTML)

Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX]   in XeTeX

2011-11-14 Thread Keith J. Schultz
Hi Everybody,

Slow down a bit. Sorry if I sound high headed here!


There seems to be a misunderstanding what exactly a
PLAIN TEXT FILE is.

Computing has evolved since I started using computers.
When I started out a plain text file was a file just holding 
7-bit ASCII or EBCDIC, or the like without control characters, except
EOF, CR,or LF! No, FF or TAB (sometimes allowed) and the others.

Eventually, files with 8-bit coding became plain text.

I guess we can consider in this modern day and age Unicode plain text.
Though, to be fair Unicode encodes glyphs, and can signal RTL usage.
So, Unicode needs an editor to be displayed correctly. But, the question is 
philosophical.

Now, for the youngsters XML, TeX, HTML are per definition plain text files.
WHAT, they do contain are commands in plain text that describe how the
information inside is to be display. Yet, a human can still read the text 
inside and
understand what is going on. 

Again, with unicode coming into the picture things do get somewhat more 
complicated
as the glyphs have to be displayed properly, so that a human can properly read 
it.
This is do to the vastness of Unicode.

Now, to the problem of copying and pasting. What does happen! 
I will take the HTML case! When you copy text from a browser with
&nnsp. Do you get '&nnsb', a simple blank, or a true no blocking space!
Most likely you will get a simple blank, it depends. 
If you do get a true non-blocking space what happens if you paste it into
a different editor? Chances ore good you get a funny character displayed.

So, it boils down to the tools you use. 

That said, we come to how do we display all these great glyphs. Most are easy 
enough,
white space is very hard for humans to read, they are just that white. 
Some the different types of white space should be displayed differently. The 
same could be
said of glyphs that are composed instead of being just one represented by one 
glyph.
The problem is how to do it so that it does not look ugly or very confusing!

In other words, we have to live with some compromises! That is easy discern 
ability or ease of readability.


regards
Keith.   


Am 13.11.2011 um 19:46 schrieb Tobias Schoel:

> 
> 
> Am 13.11.2011 20:25, schrieb Zdenek Wagner:
>> (La)TeX source file is not a plain text. Every LaTeX document nowadays
>> starts with \documentclass but such text is not present in the output.
> 
> Of course, the preamble isn't plain text, but mostly macros. I thought of the 
> body of the document. I think, it's common practice for larger documents to 
> have a main latex file, which reads \documentclass … 
> \begin{document}\input{first_chapter}\input{second_chapter}…\end{document}
> In these cases, the input documents are more or less plain text (depending on 
> the subject).
> 
>> Even XML is not plain text, you can use entities as ,' and
>> many more. Of course, if (La)TeX is used for automatic processing of
>> data extracted from a database that can contain a wide variety of
>> Unicode character, it is a valid question how to handle such input.
> Or if the content is copy-pasted, from let's say HTML. But who would do that …




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex