Re: [Standards] Proposed XMPP Extension: Character counting in message bodies

Marvin W Thu, 19 Dec 2019 11:50:30 -0800

On 12/19/19 1:59 PM, Andrew Nenakhov wrote:

Is it really any better than escaped XML text?

Yes. Any sane implementation of XML parsers would resolve references aspart of the parsing, so you would have to do extra work to find out whatreferences were in the text before.

Plus, when doing the web client this means an additionalescaping - deescaping routine every time when something issent-displayed, cause browsers require their own escaping.

I hope that any web client would not use innerHtml or similar techniquesto display the message body, but instead rely ondocument.createTextNode() which expects a string without references.Similarly inputElement.value and element.textContent give you theirstrings without references. In generally HTML/JS do their best toabstract away from references, because why should an applicationdeveloper deal with that?

Also HTML uses a different set of predefined references then XML and hasdifferent requirements - ä is valid in HTML but not in XML (withoutit being defined as an entity in a DTD).

Why should standard be concerned about different server implementationsconverting anything? If a server does some converting for some reasonfrom one way of escaping XML to another, of course it should recalculateall references.

On the XML layer (which is what XMPP build on) this "conversion" doesnot change anything (the texts stay the same), that's why it isperfectly valid for a server to do it. The protocol on top of XML (andsubsequently XMPP) should not deal with references, they are resolved onthe layer below. That's why it is a bad idea to assume specificcharacters to be represented using certain references, because you can'tcontrol that (you can only assume things).

So I tried with Xabber/xabber.org and either your server or the client(I guess it's the server) seems to fail to properly do what you justsaid it should: When sending the message


<message type="chat">
  <body>>>>>></body>

<reference xmlns='urn:xmpp:reference:0' begin='1' end='1'type='markup'><bold/></reference><reference xmlns='urn:xmpp:reference:0' begin='3' end='3'type='markup'><bold/></reference>

</message>

it is displayed as

&gt;>>>>

with g and ; in bold.

So far our 'non-standard' way of usingreferences is in fact way more 'standard' than what is currentlysuggested by this mish-mash of different XEPs.

I guess we have different definitions of a standard. These mish-mash ofdifferent XEPs is a publicly viewable standard proposal. I am not awareof a documentation of what Xabber is doing

Not really cool, right?

What's bad about that? I would say that having "0..0 bold" is prettyweird, because it sounds like an empty range (it starts and ends at thesame point, so it must be empty).


    The second integer represents the location of the first non-URL
    character occurring after the URL *(or the end of the string if the
    URL is the last part of the Tweet text)*

I think you are misunderstanding them here. I am pretty sure "the end ofthe string" is *after* the last character, not the last character.

Cited example of programming languages is valid only in part. Yes, it isso in java or python, but not so in swift, obj-c or erlang. The lastthree use index of the first character and length, which is actually myfavourite approach.

I don't think it really makes sense to discuss which programminglanguage is the one that matters most, but:

- Swift has two operators "ABCDE"[2...4] = "CDE" and "ABCDE"[2..<4] = "CD"
- Objective-C substring functions require index and length

- Erlang uses 1-based indices, string:sub_string("ABCDE", 2, 4) = "BCD",thus is equivalent to python [1:4]

Also when you prefer index of first char and length, why not use <refbegin="2" length="2" /> then? For languages that take string length, youcurrently have to calculate length = end+1-begin (because you chose tohave end one less than everyone else does).

ср, 18 дек. 2019 г. в 21:59, Marvin W <x...@larma.de<mailto:x...@larma.de>>:


    I don't think it really is a "change", in XEP-394 it is already defined
    this way ("the last affected codepoint is the one just before end" [1])
    and the example in XEP-372 [2] also counts that way (char 72 is the "J"
    of and char 78 is the space after "Juliet"). Only the text misleadingly
    says "An end attribute is similarly used for the index of the last
    character of the reference.", so this may need a clarification.


Well. I strongly object.

Either we need to change the text in XEP-372 slightly or we have tochange the examples in XEP-372 and the text and examples in XEP-394(because both should do the same). I see you have a strong opinion onthe one side for some reason.

( Btw, did anyone but us implement this XEP at all?  )

Converse has an implementation of XEP-372 for mentions (the only usecasethat is properly defined in that XEP IMO).

On 'already defined' 394. As we have learned from 0071 debacle, evenwidely implemented XEPs can be deprecated with vague reasoning, sodeprecating a contradictory XEP that, to my knowledge, wasn't evenimplemented anywhere, shouldn't be too much of an issue.

Sure, we could deprecate XEP-394, but I don't see a proper replacementfor it yet. I consider the thing Xabber is doing more like a misuse ofXEP-372, which according to its abstract defines a method for one XMPPstanza to provide references to another entity, such as mentioningusers, HTTP resources, or other XMPP resources - not a way for puttingmarkup everywhere. I'd rather like to get rid of XEP-372 (which has alot of unclear things and pending TODOs in it) then XEP-394 (which ofcourse can surely be improved).

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
_______________________________________________

Re: [Standards] Proposed XMPP Extension: Character counting in message bodies

Reply via email to