Re: Tag characters and in-line graphics (from Tag characters)

Asmus Freytag (t) Sun, 31 May 2015 04:10:59 -0700

John,

reading this discussion, I agree with your reaductio ad absurdum ofinfinitely nested HTML.

But I think you are onto something with your hypothetical example of the"subset that works in ALL textual situations".

There's clearly a use case for something like it, and I believe manypeople would intuitively agree on a set of features for it.

What people seem to have in mind is something like "inline" text.Something beyond a mere stream of plain text (with effectively everycharacter rendered visibly), but still limited in important ways bygeneral behavior of inline text: a string of it, laid out, must wrap andline break, any objects included in it must behave like characters(albeit of custom width, height and appearance), and so on. Paragraphformatting, stacked layout, header levels and all those good thingswould not be available.

With such a subset clearly defined, many quirky limitations might nolonger be necessary; any container that today only takes plain textcould be upgraded to take "inline text". I can see some inlinecontainers retaining a nesting limitation, but I could imagine that itis possible to arrive at a consistent definition of such inline format.

Going further, I can't shake the impression that without a cleandefinition of an inline text format along those lines, any attempts atmaking stickers and similar solutions "stick" are doomed to failure.

The interesting thing in defining such a format is not how to representit in HTML or CSS syntax, but in describing what feature sets it must(minimally) support. Doing it that way would free existingimplementations of rich text to map native formats onto that minimallyrequired subset and to add them to their format translators for HMTL orwhatever else they use for interchange.

Only with a definition can you ever hope to develop a processing model.It won't be as simple as for plain text strings, but it should be ableto support common abstractions (like iteration by logical unit). Itwould have to support the management of external resources - if theinline format allows images, custom fonts, etc. one would need a way tomanage references to them in the local context.

If your skeptical position proves correct in that this is something thatturns out to not be tractable, then I think you've provided conclusiveproof why stickers won't happen and why encoding emoji was the onlysensible decision Unicode could have taken.


A./

On 5/30/2015 7:14 AM, John wrote:

Hmm, these "once entities" of which you speak, do they requirejavascript? Because I'm not sure what we are looking for here isstatic documents requiring a full programming language.
But let's say for a moment that html5 can, or could do the job here.Then to make the dream come true that you could just cut and pastetext that happened to contain a custom character to somewhere else,and nothing untoward would happen, would mean that everything in thecomputing universe should allow full blown html. So every Java Swingcomponent, every Apple gui component, every .NET component, everywindows component, every browser, every Android and IOS componentwould allow text entry of HTML entities. OK, so let's say everyoneagrees with this course of action, now the universal text format is HTML.
But in this new world where anywhere that previously you could inputtext, you can now input full blown html, does that actually makesense? Does it make sense that you can for example, put full blownHTML inside a H1 tag in html itself? That's a lot of recursion goingon there. Or in a MS-Excel cell? Or interspersed in some otherwisefairly regular text in a Word document?
I suppose someone could define a strict limited subset of HTML to bethat subset that makes sense in ALL textual situations. That subsetwould be something like just defining things that act like characters,and not like a full blown rendering engine. But who would define thatsubset? Not the HTML groups, because their mandate is to define fullblown rendering engines. It would be more likely to be something likethe unicode group.
And also, in this brave new world where HTML5 is the new standard textformat, what would the binary format of it be? I mean, if I have thestring of unicode characters <IMG would that be HTML5 image definitionthat should be rendered as such? Or would it be text that happens tocontain greater than symbol, I, M and G? It would have to be theformer I guess, and thereby there would no longer be a unicode symbolfor the mathematical greater than symbol. Rather there would be aunicode symbol for opening a HTML tag, and the text code for greaterthan would be > Never again would a computer store > to meangreater than. Do we want HTML to be so pervasive? Not sure it deservesthat.
And from a programmers point of view, he wants to be able to iterateover an array of characters and treat each one the same way,regardless if it is a custom character or not. Without that kind ofprogrammatic abstraction, the whole thing can never gain traction. Idon't think fully blown HTML embedded in your text can fulfill that. Avery strictly defined subset, possibly could. Sure HTML5 can RENDERstuff adquately, if the only aim of the game is provide a correctrendering. But to be able to actually treat particular images embeddedas characters, and have some programming library see that abstractionconsistently, I'm not sure I'm convinced that is possible. Not withoutnailing down exactly what html elements in what particularcircumstances constitute a "character".
I guess in summary, yes we have the technology already to renderanything. But I don't think the whole standards framework doesanything to allow the computing universe to actually exchange customcharacters as if they were just any other text. Someone would actuallyhave to work on a standard to do that, not just point to html5.
On Saturday, 30 May 2015 at 5:08 am, Philippe Verdy<[email protected] <mailto:[email protected]>>, wrote:
    2015-05-29 4:37 GMT+02:00 John <[email protected]
    <mailto:[email protected]>>:

        "Today the world goes very well with HTML(5) which is now the
        bext markup language for document (including for inserting
        embedded images that don’t require any external request”
        If I had a large document that reused a particular character
        thousands of times, would this HTML markup require embedding
        that character thousands of times, or could I define the
        character once at the beginning of the sequence, and then
        refer back to it in a space efficient way?


    HTML(5) allows defining *once* entities for images that can then
    be reused thousands of times without repeting their definition.
    You can do this as well with CSS styles, just define a class for a
    small element. This element may still be an "image", but the
    semantic is carried by the class you assign to it. You are not
    required to provide an external source URL for that image if the
    CSS style provides the content.

    You may also use PUAs for the same purpose (however I have not
    seen how CSS allows to style individual characters in text
    elements as these characters are not elements, and there's no
    defined selector for pseudo-elements matching a single character).
    PUAs are perfectly usable in the situation where you have embedded
    a custom font in your document for assigning glyphs to characters
    (you can still do that, but I would avoid TrueType/OpenType for
    this purpose, but would use the SVG font format which is valid in
    CSS, for defining a collection of glyphs).

    If the document is not restricted to be standalone, of course you
    can use links to an external shared CSS stylesheet and to this SVG
    font referenced by the stylesheet. With such approach, you don't
    even need to use classes on elements, you use plain-text with very
    compact PUAs (it's up to you to decide if the document must be
    standalone (embedding everything it needs) or must use external
    references for missing definitions, HTML allows both (and SVG as
    well when it contains plain-text elements).

Re: Tag characters and in-line graphics (from Tag characters)

Reply via email to