Marco Cimarosti kindly responded to my post about XML and plane 14 tags.
Here are some initial comments so as hopefully to set the scene.  I am
studying Marco's examples with interest.  I am already learning a lot from
them.  I shall be interested to observe how other people on the Unicode
mailing list respond to Marco's post.

I do notice, however, that Marco does not program in XML any of the examples
which I suggested in my post.

Marco wrote.

quote

(Warning: I have probably succeeded in the impossible task of being more
verbose than Mr. Overington. Please start reading only if you have a few
free time... :-)

end quote

Well, I think that a verbose writing style often assists clarity.  Writing a
document is different from having a conversation.  In a conversation if
someone does not follow a line of reasoning then the person putting forward
the reasoning can respond to questions or to body language and make a more
detailed explanation, or perhaps approach the matter from another direction:
in a posting there is a need to try to set out an argument with clarity of
meaning all at once.

[snip]

>My job is to implement software based on written specifications which
represent my bosses' understanding of the  requirements of our customers.
Unfortunately, the specifications I receive are often verbose and fuzzy like
Mr.  Overington's posts...

Well, my posts are often verbose, yet I like to think, never fuzzy.

>I will be pretending that William is actually "Overington Inc.", one of the
key customers of the company I work  with, and that they are asking me to
implement a protocol to send text over the famous "Overington Multimedia
Broadcasting (OMB)", with the following requirements:

So rather than being willing to respond to an individual you pretend to
respond to a non-existent company which cannot vote in elections yet which
may be able to pay to have a vote at a Unicode Technical Committee meeting,
which an individual cannot do!  :-)

Yet, this pretend scenario may be the root of missing one of the main points
which I am trying to make.  There is no such thing as OMB and there will not
be because I want to use the DVB-MHP (Digital Video Broadcasting -
Multimedia Home Platform) system, which is a system which can be used by
everybody.  Since 1978 I have been advocating that there should be a
manufacturer-independent standard for telesoftware broadcasts.  There has
been much delay and problems in getting telesoftware implemented on a
lasting basis because so many implementations of telesoftware were based on
proprietary system specifications.  The DVB-MHP system, http://www.mhp.org
for the details, is a manufacturer-independent system which uses Java.  My
suggested portable object code system, 1456 object code, (in speech, please
say "fourteen fifty-six object code") which dates back in its first format,
which I later discarded, to 1978, was an early attempt at such portability.
However, although Java is the standard for DVB-MHP programs, Java doing far
more than 1456 object code ever did, 1456 object code did have one feature
which Java does not, and today 1456 object code has been redesigned to make
use of that one feature, sitting on top of a Java platform.  I like to think
that 1456 object code today, which sits on top of a Java platform and uses
Java for standardization of its data types, is a system which is useful in
some, though by no means all, situations where a Java quality graphics
effect is wanted on either the web or upon the DVB-MHP platform.  1456
object code is particularly useful where people wish to write relatively
short programs with Java quality graphics yet do not have ready access to a
Java compiler and may not know any Java.  For example, where a distance
education author wants to produce an interactive illustration.  This is
because 1456 object code can be written directly in ASCII printable
characters using a text editor such as Notepad.  However, for use within a
UTF-16 system, EA00 hexadecimal can be added to each of those characters so
that the 1456 object code can be expressed using (some of the) Private Use
Area characters in the range U+EA00 to U+EA7F.

http://www.users.globalnet.co.uk/~ngo/14560000.htm

However, 1456 object code is just a format for using within a Java program;
that is, the Java program treats the 1456 object code software program as
being data for a specific Java program which is called a 1456 engine.  So
1456 object code is not a standard system, it just a programming language
which can be processed by a Java program, so it can be used on the DVB-MHP
platform without in any way needing the DVB-MHP system specification to be
altered to accommodate its use.  The DVB-MHP platform does specify the use
of Java in a very detailed manner.

That essential difference between how the DVB-MHP system relates to Java and
how the DVB-MHP system relates to 1456 object code is important.  It is, in
my opinion, the same sort of difference as between how the Unicode
specification relates to plane 14 tags and how the Unicode specification
relates to element names in an XML file.  I feel that that is the essential
point which I am trying to convey.

> 1. The text MUST be transmitted in UTF-8 (because the CEO of
Overington Inc. thinks that UTF-8 is cute).

Well, I, as an individual, was thinking in terms of UTF-16.

>2. The transmission protocol MUST implement some form of language
tagging (the details of the protocol  are up to me). Particularly, the
system needs to distinguish English text from Italian text, because the two
languages will be displayed in different colors (green and red,
respectively).

Green for English, red for Italian.  Are you by any chance a fan of the
liveries of motor racing cars of the 1950s?

> 3. The OveringtonHomeBox(tm) can only accept UTF-8 plain text
interspersed with escape sequences to  change color. The escape sequences
have the form "{{color=1}}", where "1" is the id of a color (blue, in this
case).

If I were writing a one-off program I would use U+F3E2 for red and U+F3E5
for green.

http://www.users.globalnet.co.uk/~ngo/court000.htm

http://www.users.globalnet.co.uk/~ngo/courtcol.htm

However, the issue is not, in my opinion, about one-off programs and
proprietary encodings.  The issue is ensuring that plane 14 tags are not
totally deprecated so that, as an option for use with particular protocols,
they continue to be available so that encodings for general computing usage,
for general and widespread information availability, on a rigorous
non-proprietary encoding basis may be used.  Certainly, within certain
multimedia programs which might at some future time run upon the DVB-MHP
platform, codes such as U+F3BC might be particularly useful, yet that is a
matter which an individual programmer needs to consider when writing such a
program: it is not a standard system, though it is not a proprietary system
either in the usual sense of the word as those codes are published with the
hope of being a consistent set which people may use if they so choose.
Please note that, notwithstanding your pretend scenario of a company, that
that is not the way I am proceeding with my research.  I invented the
telesoftware concept and am doing what I can to get it used effectively and
to ensure that it can have scope for future development of content.  I
regard the continued availability of plane 14 tags as important, as it means
that content authors can then use codes which do the job by finding them in
an international standard, without having to use what I suggest.  I could
devise all manner of codes using plane 16 if I wish, copying the plane 14
tags across as a start, yet those codes, no matter how fine, no matter how
well publicised in research papers or in a book or whatever, those codes
would never have the provenance of the codes in an international standard.
That is why, although Private Use Area codes do certainly have a use for
research and for concept proving, and also for limited use between two or
more people studying something special topic, Private Use Area codes, and
XML element names made up by a programmer or even by a committee which is
not a standards committee, simply do not come into the same class of
provenance quality as plane 14 tags which are in the Unicode standard.  That
is why I hope that the Unicode Technical Committee will not totally
deprecate tags and will leave open the possibility of considering adding
additional tag types at some time in the future.

> 4. The text files being transmitted MUST be .... small (bandwidth is
limited!).

Yes, keep the text file size down, bandwidth is limited.

> 5. The processing program must be .... small (on-board memory is
limited!).

No, for DVB-MHP the on-board memory is fairly large.  The transmission link
is the key issue.

> 6. A working prototypes must be ready by tomorrow.

Well, this is about the way that these things will be done well into the
future.  The idea of Unicode is that it will last, not be swept away within
ten or twenty years because it is outdated for future needs.

I have had a look through the example solutions, but, I do need to spend
some more time studying them and hopefully trying out the executable
programs with some other data files.  In the meantime I would be interested
to know any further views of Marco and the views of others on this topic.

Thank you for taking the time to write your post and prepare the programs.
I feel that it is important that this matter be studied thoroughly.

William Overington

21 February 2003





Reply via email to