Hmm, these "once entities" of which you speak, do they require javascript? 
Because I'm not sure what we are looking for here is static documents requiring 
a full programming language.




But let's say for a moment that html5 can, or could do the job here. Then to 
make the dream come true that you could just cut and paste text that happened 
to contain a custom character to somewhere else, and nothing untoward would 
happen, would mean that everything in the computing universe should allow full 
blown html. So every Java Swing component, every Apple gui component, every 
.NET component, every windows component, every browser, every Android and IOS 
component would allow text entry of HTML entities. OK, so let's say everyone 
agrees with this course of action, now the universal text format is HTML.




But in this new world where anywhere that previously you could input text, you 
can now input full blown html, does that actually make sense? Does it make 
sense that you can for example, put full blown HTML inside a H1 tag in html 
itself? That's a lot of recursion going on there. Or in a MS-Excel cell? Or 
interspersed in some otherwise fairly regular text in a Word document?




I suppose someone could define a strict limited subset of HTML to be that 
subset that makes sense in ALL textual situations. That subset would be 
something like just defining things that act like characters, and not like a 
full blown rendering engine. But who would define that subset? Not the HTML 
groups, because their mandate is to define full blown rendering engines. It 
would be more likely to be something like the unicode group.




And also, in this brave new world where HTML5 is the new standard text format, 
what would the binary format of it be? I mean, if I have the string of unicode 
characters <IMG would that be HTML5 image definition that should be rendered as 
such? Or would it be text that happens to contain greater than symbol, I, M and 
G? It would have to be the former I guess, and thereby there would no longer be 
a unicode symbol for the mathematical greater than symbol. Rather there would 
be a unicode symbol for opening a HTML tag, and the text code for greater than 
would be &gt; Never again would a computer store > to mean greater than. Do we 
want HTML to be so pervasive? Not sure it deserves that.




And from a programmers point of view, he wants to be able to iterate over an 
array of characters and treat each one the same way, regardless if it is a 
custom character or not. Without that kind of programmatic abstraction, the 
whole thing can never gain traction. I don't think fully blown HTML embedded in 
your text can fulfill that. A very strictly defined subset, possibly could. 
Sure HTML5 can RENDER stuff adquately, if the only aim of the game is provide a 
correct rendering. But to be able to actually treat particular images embedded 
as characters, and have some programming library see that abstraction 
consistently, I'm not sure I'm convinced that is possible. Not without nailing 
down exactly what html elements in what particular circumstances constitute a 
"character".




I guess in summary, yes we have the technology already to render anything. But 
I don't think the whole standards framework does anything to allow the 
computing universe to actually exchange custom characters as if they were just 
any other text. Someone would actually have to  work on a standard to do that, 
not just point to html5.








On Saturday, 30 May 2015 at 5:08 am, Philippe Verdy <verd...@wanadoo.fr>, wrote:


2015-05-29 4:37 GMT+02:00 John <idou...@gmail.com>:

"Today the world goes very well with HTML(5) which is now the bext markup 
language for document (including for inserting embedded images that don’t 
require any external request”

If I had a large document that reused a particular character thousands of 
times, would this HTML markup require embedding that character thousands of 
times, or could I define the character once at the beginning of the sequence, 
and then refer back to it in a space efficient way?





HTML(5) allows defining *once* entities for images that can then be reused 
thousands of times without repeting their definition. You can do this as well 
with CSS styles, just define a class for a small element. This element may 
still be an "image", but the semantic is carried by the class you assign to it. 
You are not required to provide an external source URL for that image if the 
CSS style provides the content.




You may also use PUAs for the same purpose (however I have not seen how CSS 
allows to style individual characters in text elements as these characters are 
not elements, and there's no defined selector for pseudo-elements matching a 
single character). PUAs are perfectly usable in the situation where you have 
embedded a custom font in your document for assigning glyphs to characters (you 
can still do that, but I would avoid TrueType/OpenType for this purpose, but 
would use the SVG font format which is valid in CSS, for defining a collection 
of glyphs).




If the document is not restricted to be standalone, of course you can use links 
to an external shared CSS stylesheet and to this SVG font referenced by the 
stylesheet. With such approach, you don't even need to use classes on elements, 
you use plain-text with very compact PUAs (it's up to you to decide if the 
document must be standalone (embedding everything it needs) or must use 
external references for missing definitions, HTML allows both (and SVG as well 
when it contains plain-text elements).

Reply via email to