Dear Christian,
Thank you very much for your detailed revision,
Please see inline our comments.
On 16/8/21 16:40, Christian Amsüss wrote:
Hello CoAP-EAP authors and involved groups,
(CC'ing core@ as this is a review on CoAP usage),
I've read the -03 draft and accumulated a few comments; largely in
sequence of occurrence.
Over all, the protocol has improved a lot since I've last had my eyes on
it. Several comments below are about how prescriptive the message types
are. I believe that this should be resolved towards generality, or else
the usability of this protocol with generic CoAP components will be
limited (or, worse, still implemented and then surprisingly
incompatible).
* Figure 1: For readers new to the topic of EAP, I think that it might
be useful to extend this to cover also the EAP server or AAA
infrastructure, if that can be covered without too much complication.
Suggestion (without illusions of correctness):
IoT device Controller
+------------+ +------------+
| EAP peer/ | | EAP auth./ |+-----+[ AAA infra. ]
| CoAP server|+------+| CoAP client|+-----+[ EAP server?]
+------------+ CoAP +------------+ EAP?
\_____ Scope of this document _____/
Figure 1: CoAP-EAP Architecture
[Authors] This is a good point. We did not include it at first, as
having a AAA infrastructure is not mandatory. But the optionality can
also be expressed in the figure. We will consider using this for the
next version. Please also be aware that this architecture including AAA
is assuming something called EAP authenticator in pass-through mode.
Nevertheless, an EAP authenticator in standalone mode is also possible,
where no AAA exists.
* `/.well-known/a`: [note: May be irrelevant, see next two items]
If the designated experts don't go along with a
very-short option (I'd kind of doubt you'd get anything shorter than
`/.well-known/eap`) and if that puts you up against practical limits,
using a short-hand option might be viable.
So far there's no document for it and I've only pitched the idea
briefly at an interim[1] (slides at [2]), but if push comes to shove
and you need the compactness, let me know and that work can be
expedited.
[1]:
https://datatracker.ietf.org/meeting/interim-2021-core-05/session/core
[2]:
https://datatracker.ietf.org/meeting/interim-2021-core-05/materials/slides-interim-2021-core-05-sessa-core-option-for-well-known-resources-00
[Authors] You are correct. This was addressed by the well-known URI
experts and they have proposed to use /.well-known/coap-eap
* Discovering the Controller is described rather generically, but with
CoAP discovery as an example.
As long as CoAP discovery (as per RFC6690/7252) is used, that already
produces a URI, which can contain any path the server picked. It has
thus no need for a well-known path.
Are there other discovery options envisioned that'd only result in a
network address? Only for these, a well-known path would make sense.
(And then it's up to the envisioned client complexity if one is
warranted).
[Authors] This is related to the next point. As long as the IoT device
sends the resource for the authentication in this case we would remove
the need for the well-known in the IoT device.
For comparison, RD[3] explores some of the options. A path may be
discovered using CoAP discovery as `?rt=core.rd*` right away from
multicast. Or an address may be discovered using an IPv6 RA option,
with CoAP discovery acting on that address. Only for cases of very
simple endpoints, it also defines a `/.well-known/rd` name that
can be
used without CoAP discovery (and thus link parsing) happening
beforehand. The same rationales may apply for EAP (the devices using
the resource are mostly servers, otherwise, and send a very simple
request to start things), but again that's only if the address was
discovered through something that's not CoAP discovery already.
[3]:
https://www.ietf.org/archive/id/draft-ietf-core-resource-directory-28.html#name-rd-discovery-and-other-inte
[Authors] This is a good point. We may need to add a discussion of how
the service can be discovered since, as you commented, there are
different ways to do so. Our initial idea was to contact directly with
the Border Router, which is consistent with what you comment about
receiving the IPv6 RA to receive the IPv6 Address. Hence the
communication would be directly using the /.well-known URI and the
discovered IPv6 address.
* For message 1, why does this need to go to a fixed resource? There has
been previous communication in message 0 in which the resource could
have been transported.
Granted, it's not as easy as in messages 2-to-3 etc where the
Location-* options are around, but the original message 0 POST could
just as well contain the path in the payload.
There are options as to how to do that precisely (just the URI
reference in text form, or a RFC6690 link, or a CBOR list of path
segments, or a CRI reference[4] -- if the latter were in WGLC already
I'd recommend it wholeheartedly), but either of them would stay more
true to the style of the other messages in that the earlier message
informs the path choice of the next ones.
An upside of this would be that it allows better behavior in presence
of proxies (see later), even though it may be practical to not spec
that out in full here. (But the path would be open for further specs,
and they'd just need some setting down of paving stones).
[4]: https://datatracker.ietf.org/doc/draft-ietf-core-href/
[Authors] This is an interesting proposal. This is a good alternative to
having to use the well-known URI in both entities, leaving it only for
the first message from the IoT device to the Controller, which makes sense.
* (Bycatch of suggesting URIs): It may be worth mentioning that the
NON's source address can easily be spoofed. Thus, any device on the
network can trigger what the authenticator may think of as a
device-triggered reauthentication, and the device may think of as an
authenticator-triggered reauthentication (provided it works that way,
see below when reauthentication is mentioned again).
[Authors] But this case would not be possible since we mention that (re)
authentication is initiated by the device. Thus, when the device sees an
authenticator triggered re-authentication will discard that.
Even sending full URIs in message 0 would be no worse than the
current
source spoofing.
Sending URI paths in message 0 would make this minimally better
because the attacker would need to guess (or observe from the
network)
the CoAP server's path.
[Authors] Correct.
* In 3.1 General flow, the message types are described in high detail.
CoAP can generally be used with different transports (some of which
don't even do NON/CON). Also, while I think it's reasonable to expect
that a CoAP implementation can deal with requests coming in as either
CON or NON, I'd expect that some don't offer all possible choices to
applications. (A very constrained device may only send NON requests,
or an implementation may decide autonomously whether to send
piggy-backed or not).
[Authors] Regarding this, it is worth noting that, except for message 0,
the constrained device (CoAP server) is not sending requests but
responses. Therefore, it will receive CON requests sent by the
non-so-constrained CoAP client (EAP authenticator)
Regarding piggybacking, it would be a requirement for this
specification, with the goal of saving messages.
Can you clarify as to what of this is meant to be normative and what
exemplary?
My recommendation is to state that what is prescribed is the flow of
requests and responses (which is what CoAP provides to the next
layer), while notes on reliable transmission are recommendations for
CoAP-over-UDP/DTLS. A similar statement, which I like a lot, is
already in 3.2 on error handling.
[Authors] This may be tricky as per the operation of CoAP-EAP. See
comments below.
(I can serve examples of how subtle incompatibilities can develop but
go unnoticed, but I'd only go through that if this is all really
intended to be prescriptive).
[Authors] In principle, they are intended to be prescriptive. Therefore,
it would be really appreciated if you list the incompatibilities you
have in mind.
Having said this, relaxing the piggybacked responses in the
specification is something that we understand that should not be a
problem, except the “undesirable” increment of messages over a
constrained link.
About the usage of NON or CON, the situation is the following. First of
all, as mentioned, the CON request is done by the Controller, which is
assumed to be a not-so-constrained device. So we do not foresee a
problem there. In any case, we have spent several cycles trying to think
how everything would work in the case of using NON, assuming the HATEOAS
approach.
As Mohit Sethi also commented, “EAP provides its own support for
duplicate elimination and retransmission, but is reliant on lower layer
ordering guarantees” (from RFC 3748).
Therefore, there was a chance to use NON requests and responses with the
help of EAP. To achieve “ordering guarantees” we have the HATEOAS
approach, but in the case of using NON request and responses, as we see
it, the CoAP server would need to handle two resources at the same
time. This is the reason. If , for example, message 4 (below) is lost
(where the new resource is informed), the CoAP client will not know that
the following NON request in the sequence should go to resource /a/y .
The only resources still known by CoAP client is /a/x. That means that
resource /a/x MUST not be removed yet from the CoAP server. So the CoAP
server keeps two resources: current step /a/x and next expected step in
/a/y. If a message from the CoAP client arrives to /a/x it MUST be
considered a retransmission from the EAP authenticator state machine
because the CoAP client did not receive the new resource /a/y. This EAP
retransmission is handled by the EAP peer state machine, though at the
application level we could silently discard the payload. However if the
NON request arrives to /a/y in message 5, it means that the CoAP client
received message 4. In such a case, /a/x is removed , /a/y is the
current step and, for example, /a/z becomes the next expected resource.
NON [0x8694] POST /a/x |
| Token (0xac) |
| Payload EAP-X-Req 1 |
3) |<----------------------------------------|
| NON [0x4754] |
| Token (0xac) |
| 2.01 Created Location-Path [/a/y] |
| Payload EAP-X-Resp 1 |
4) |---------------------------------------->|
Moreover, each retransmitted EAP request will go to a newNON request to
/a/x (with different token values). It may happen that both arrive to
the CoAP server that answers with two different NON responses saying
that the next resource is /a/y. If one of the NON responses indicating
/a/y arrives very much later when the interaction moved forward and it
is in resource, let’s say, /a/z, when CoAP client sees /a/y will think
that next rsource is let’s say /a/y instead /a/z. That, the CoAP client
will process the “old” NON response that said /a/y , when that resource
is not available. Therefore the CoAP client would need to keep track, at
the application level, of the resources already seen. Otherwise, the
CoAP client might get confused. Therefore we are carrying the complexity
to the application when this is something it could be solved with CON
requests at CoAP level.
Finally, another problem we see is that EAP success is not
retransmitted, so we believe that, at least, would require a CON request.
Therefore, when NON request and responses are used, we need to specify
this kind of behaviour in the CoAP server. And that behaviour changes if
we are using CON requests because keeping two resources is not
necessary. Is this reasonable?.
* The reuse of the empty token only works if the peers actually respond
with piggy-backed responses, so that's where enforcing the above
rules
would give some benefit -- but at the cost of losing existing CoAP
implementations that make no guarantees as to how the response
will be
sent as long as it's reliable.
[Authors] The use of the Token empty in this case is just proposed as an
optimization to be used when possible. This is not intended to be
prescriptive. And using NON request and responses we believe should not
have an empty value.
* Proxying: As it is right now, this protocol just barely works across
proxies, and only if they support CoAP-EAP explicitly. (And while it
may sound odd to even consider that, bear in mind that they are used
in a very similar way in RFC9031).
While it's a bit open whether all CoAP-based protocols should
reasonably be expected to work across proxies or not, a remark (maybe
before 3.1?) that "If CoAP proxies are used between EAP peer and EAP
authenticator, they must be configured with knowledge of this
specification, or the operations will fail after step 0."
[Authors] Based on your comment, it seems there is no guarantee that any
CoAP-based protocol would work across proxies. Our question is whether
there is any adaptation or change that would favour working through
proxies. At the research level, we worked with proxy and you are right,
our assumption is that proxies support CoAP-EAP explicitly
(https://ieeexplore.ieee.org/document/8467302
<https://ieeexplore.ieee.org/document/8467302>). Since we are trying to
avoid right now anything tailored to CoAP-EAP and only using CoAP as a
means of transport for the exchange, why do you think this would not
work with proxies?
* 3.2.2: The use of RST is rather unusual here, for the same reasons as
the prescriptive message types.
A response of 5.03 (Service Unavailable) has roughly the same size,
is available independent of transport, and on most libraries *way*
easier to use, if they support sending an RST to a well-formed
message
at all.
(Furthermore, the sender of the 5.03 can encode an estimate of the
remaining unavailable time in the Max-Age option; not sure if that is
of any help here).
* 3.3.1: "received with the ACK", "sends piggybacked response" are,
again, overly specific. "received in the last response" and "sends a
response" could work as replacements even if message types are
presecriptive.
[Authors] We used RST as the examples of the CoAP RFC seemed to convey
that meaning when the endpoint is not in a position to respond to the
request, but this seems to be an easier way to achieve the same goal.
And as you say, if this is easier on implementations we should strive
for that.
* 3.3.1: "after the maximum retransmission attempts, the CoAP client
will remove the state from its side".
So the device that's being kicked from the network can delay its own
eviction for about a minute as long as it doesn't answer?
[Authors] This is an interesting use case. To avoid this, we may have to
change the behaviour, to a NON-message and just remove the state.
* 3.3.2: Is reauthentication always triggered by the EAP peer, or can it
also be triggered by the authenticator? If the latter, will the
authenticator use /.well-known/a again, or POST something to the
resource from where it'd DELETE in 3.3.1?
[Authors] The answer is yes, EAP peer always triggered the
re-authentication, as it is the one interested in maintaining its
membership within the domain, or even it could be dormant at some
points. A use case for these is LoRaWAN nodes, that have the capability
of starting the communication regardless of their class, whereas the
Network server may not send a message until it has received something
from the IoT device.
* cryptosuites: What's the upgrade model of that hardcoded list? As it
is now, it looks pretty static, so updates would be through
updates of
this document. The obvious alternative is an IANA registry with
ranges, policies and the usual pros and cons.
Then again, this is not the first nor last time AEAD algorithms with
their parametrization and hash functions are assigned aggregate
numbers (I-D.ietf-lake-edhoc comes to mind which has asymmetric algs
in the mix too; probably others as well); can we deduplicate this
with
anything? (Possibly by bringing this up with COSE or OSCORE people).
[Authors] In the next version we will propose a structure to add
different parameters to the CoAP-EAP exchange. Within these parameters,
we considered the extensibility of the crypto suite algorithms.
* OSCORE derivation: Is it cryptographically necessary to derive *both*
a master secret and a master salt through KDF? (Sounds like a
needless step to me, as both only go into KDF once more when the
actual OSCORE parameters are derived). I *guess* there's a good
reason
why the MSK is not used as the OSCORE IKM right away and the CSO as
OSCORE master salt, but it'd help to have at list a comment here on
why that's needed.
(It may be useful to compare this step to the HKDF steps in OSCORE;
their info element is always a 5-element array with a 4th "type"
element of "Key" or "IV"; other extractions might just hook in there
with different type values, maybe, and save everyone an extra
handling
step).
[Authors] You are right, there should be a clarification on why this is
done the way it is. The main purpose is to use MSK as the root key for
key derivation. It is common practice with the usage of the MSK. If say
key were compromised, another one could be derived from the MSK, without
having to resort to a new bootstrapping to refresh the MSK.
* OSCORE ID derivation:
* Randomly assigned full-length ideas look like an odd choice. They
are excessively long (nonce length - 6 is 7 for the MTI
AES-CCM-16-64-128 and shorter for other current ones, but I doubt
that keeping the IV *short* is necessarily a design criterion for
future algorithms).
What commonly happens here (eg. in the ACE-OSCORE profile, or in
EDHOC) is that each party picks a recipient ID out of its pool of
currently unused IDs. This makes for shorter keys, and allows the
client to be sure that no two peers use the same context.
Any chance something like that can still make it in?
[Authors] Did not see that as random but parametrised according to the
crypto suite. We will try to make this as straightforward as possible
following your comments.
* If the parties happen to be assigned the same sender ID, bad things
happen (identical key derivation, nonce reuse, nuclear meltdown).
If the current pattern of KDF'ing IDs stands, this needs to be
prevented explicitly.
[Authors] Since the Sender and recipient IDs, are derived from the MSK,
which is assumed to be fresh key material, I think this should not be a
problem.
* The derivations of "OSCORE RECIPIENT ID" and "OSCORE SENDER ID" are
confusing as they each need to happen on both sides, and the terms
will match on one and need to be opposite on the other.
(I couldn't even easily find which is intended to be which).
My suggestion is to derive "OSCORE EAP PEER SENDER ID" and "OSCORE
AUTHENTICATOR SENDER ID" instead. (Or preferably shorter strings).
[Authors] Good point, thank you. We will address this according to your
suggestion.
* Exmaples: Do you envision particular EAP protocols to be used in the
given examples?
[Authors] We consider the examples to use lightweight EAP methods. This
could be EAP-PSK for instance.
Best regards
Christian
_______________________________________________
Ace mailing list
Ace@ietf.org
https://www.ietf.org/mailman/listinfo/ace