On Sun, 20 Aug 2023 at 09:35, Guus der Kinderen
<guus.der.kinde...@gmail.com> wrote:
>

I can't believe this was in August...

> I've been toying with an Openfire-based implementation for the Reporting 
> Account Affiliations plugin. The plugin has not seen much real-world use yet 
> (it barely has been smoke tested). So far, I've only implemented the 
> 'reporting' functionality (the Info-request query mechanism, and the 
> embedding things). I have not yet implemented a consumer part (eg: using 
> account info to do something with regards to permissions). For the curious, 
> the code is here: https://github.com/igniterealtime/openfire-accountaff-plugin

I'm trying to get my own implementation finally finished and published soon.

> When implementing this, I wondered about the following.
>
> Is embedding worth it?
>
> A (considerable) benefit of embedding account info in stanzas is that it 
> saves round-trips: whenever an entity wants to know the data, it does not 
> necessarily need to perform a lookup. Are there other benefits that I've not 
> yet identified?

Lookups are a pain. Your comment focuses on the time saved, but
sending queries and waiting for a response generally consumes
resources. The other party might take a long time to response, during
which time we have to keep the stanza and associated state on hold.
Now consider this multiplied 1000x during someone flooding a MUC room
with automated bots, or whatever.

I think there are significant benefits in stanzas bearing this
information up-front if it is likely to be needed. In some cases I
think I would rather fall back to the current behaviour than implement
remote lookup. E.g. if I wanted to automatically grant voice to new
MUC users with sufficient RAA credentials, I am more likely to just
leave users who don't present RAA info without voice than to implement
a lookup and complex async join logic.

> The round-trip that's saved comes with a caveat: prior to usage, a recipient 
> MUST perform a disco/info query to the originating domain, to see if the 
> provided data can be trusted. This can be optimized (through caching), and 
> the disco/info result can be re-used for all accounts that originate from the 
> domain - but still, it's not quite round-trip free.

As you say, I anticipate such lookups only being done infrequently.

> Something about adding data to a stanza that is inherently 'unsafe' 
> (something that MUST be verified before use) does not sit well with me. It is 
> invitingly easy for an implementation to skip, or have an issue in that 
> second step, which would lead to misinterpretation of data that has potential 
> to be used for authorization purposes.

I think it happens all the time that entities send stuff that cannot
be trusted or needs additional verification. Data received from third
parties should be untrusted by default, and that's just a general
principle of most systems.

> I expect that the account information data is rarely used - although I'll 
> readily admit that I've not coded this part yet, and that there might be 
> future use-cases that I'm not thinking of. Assuming that I'm right, then the 
> embedding of data on many stanzas can add considerable overhead: not only in 
> stanza size (which possibly gets persisted in MAM archives - which leads to 
> interesting questions around the validity of this data over time), but also 
> to compute the value at the originating server. The XEP addresses this by 
> providing guides on when to embed data (more on that below), but a 
> significant amount of overhead remains. Is that worth the savings of a 
> round-trip (especially when a verification request might be needed anyway)?

I think there are only a very small number of stanza types where this
needs to be added, so I'm not too worried about overhead.

> The XEP defines three types of stanzas in which to embed account info. I've 
> found implementing this to be a lot harder than anticipated (which very well 
> might be a result of Openfire's API). One thing that I struggled with was to 
> choose when to act on a stanza that was being processed: when processing 
> inbound, client-originating stanzas, I worry about missing stanzas that are 
> generated by the server on the user's behalf. When applying this to outbound 
> stanzas, it's not straight-forward to identify stanzas that originated from 
> local users - and even if you do, something like MUC can have modified the 
> addressing, making identifying the correct local account even harder. The 
> complexity adds up, which leads to implementations that are more error-prone.

You're not alone, it's not "hard", but it's not trivial in Prosody
either. I'm still working on it.

> If I understand the XEP correctly, it specifies that a server must remove 
> account info from client-originating stanzas if the server has support for 
> embedding account info for that particular type of stanza (to avoid 
> spoofing). Given the complexity that I tried to describe above, I believe 
> that this is error-prone. Can we consider mandating that ALL account info 
> from client-originating stanzas is to be removed, if the server has support 
> for any kind of account reporting? Is there any reason to clean up this data 
> in some, but not all client-originating data?

Pretty much just to avoid the overhead of scanning all stanzas. As
mentioned above, I think RAA info only needs to be embedded in a very
small subset of stanzas. Paying the cost on *all* stanzas across a
server seems excessive. We have previously discussed this in the past
related to the various elements we have now that have a 'by' attribute
that ought to be filtered.

> Given all of the above, I'm cautiously arguing that all of the 'embedding' 
> should be removed from the XEP. I do not believe that its benefit (preventing 
> a query, sometimes) outweighs the drawbacks (in added complexity and 
> overhead). Removing it might lead to implementations that are easier (faster) 
> to build and are less error-prone.

I think it would be easier to remove embedding for the sending side,
but conversely it will make the receiving side much more complicated.
For that reason I would rather keep it. Of course I reserve the right
to change my mind as I continue implementing it....

> As a last remark: the explicit query for information is made against the bare 
> JID of the entity for which information is seeked. XMPP dictates that a 
> server MUST answer such IQ queries on behalf of the user, so this works. I do 
> not like how this implies that another entity is being asked to provide data 
> than the entity that we require to provide the data: the XEP explicitly does 
> not want a client/user to provide this data (that's even defined as 
> 'spoofing'). It wants a server to provide this data. In this light, it would 
> be clearer to address the request to the server JID (without a node-part) 
> instead of an account JID.

I see what you're saying. I don't feel strongly, but it doesn't feel
right to me. There are a bunch of things that we address to the
account if we want to learn about the account. I don't think this is
any different. Directing queries to the server JID and putting a
username field in the stanza seems worse than what the XEP currently
proposes. For example, blocklists will do the right thing with the
current approach.

I'll continue thinking about some of this stuff as I finish the implementation.

Regards,
Matthew
_______________________________________________
Standards mailing list -- standards@xmpp.org
To unsubscribe send an email to standards-le...@xmpp.org

Reply via email to