When I was working with devices at Netflix, we usually had a problem
with the remote device having very bad clock skew. (In essence, the
blinking 12:00 problem.) We also knew that the client would be valid for
a minimum of 5 years with no updates.

One approach we took was that the Server Timestamp is generally
distributed with every response from the server (The "Date:" header).
The client simply recorded the difference between the server and local
clock skew, and added that difference to the TS value used in the OAuth
headers.

Since we understood that the client was adjusting the clock, we also
debated simply accepting the TS value as an effective nonce, since
devices in the wild may have the error and we're relying on the secret
anyway. This involved us using a circular buffer of previous nonces to
prevent immediate replay attacks.

For the most part, the servers did a reasonable job of staying within 5s
of UTC. And since the connection was over https, connections were
generally bound to servers.

This combination pretty much solved our OAuth timing errors for a wide
array of devices. (As I remember, we only had about 5% error rate, and
those resulted in an immediate adjustment/retry that worked.)

Might it be useful to look at a similar approach?

On 2014/2/19 11:33 AM, Richard Newman wrote:
> At the risk of opening this email with a pun: we've invested a bunch of time 
> on both desktop[0] and Android[1] addressing clock skew problems.
> 
> (And in server-side tests, too: [2].)
> 
> Auth, token, and storage requests are all Hawk-authenticated. The Hawk 
> authentication process bakes in a timestamp. That timestamp necessarily comes 
> from the client clock. If the client clock is too far off the server clock 
> (and remember, there are three different servers in our architecture), the 
> request will be rejected because the header is wrong.[3]
> 
> The solution we've used for this is skew adjustment. We maintain a skew value 
> for each server, baking in this offset to future requests.
> 
> This is part and parcel of Hawk:
> 
> --- [4]
> Hawk uses an interesting mechanism to ensure the clock skews are within the 
> reasonable limits. When the server must fail a request on account of stale 
> timestamp (MAC computed matches with the one in the request but timestamp is 
> outside of the allowable skew), the server sends the timestamp (ts) as per 
> the server clock along with a MAC (tsm) computed using the client 
> credentials, in the WWW-Authenticate response header like so.
> 
>   HTTP/1.1 401 Unauthorized
>   WWW-Authenticate: Hawk ts="1353832234",
>                          tsm="6G8r5JiE+NLoym+WwjeHzjDNCUtLNIxmo1vpMofpLAE="
> ---
> 
> --- [5]
> Using a timestamp requires the client's clock to be in sync with the server's 
> clock. Hawk requires both the client clock and the server clock to use NTP to 
> ensure synchronization. However, given the limitations of some client types 
> (e.g. browsers) to deploy NTP, the server provides the client with its 
> current time (in seconds precision) in response to a bad timestamp.
> 
> There is no expectation that the client will adjust its system clock to match 
> the server (in fact, this would be a potential attack vector). Instead, the 
> client only uses the server's time to calculate an offset used only for 
> communications with that particular server. The protocol rewards clients with 
> synchronized clocks by reducing the number of round trips required to 
> authenticate the first request.
> ---
> 
> 
> Correct and efficient usage of Hawk is predicated on clients with correct 
> clocks, which seems like an insane assumption to make: at least 3.5% of 
> Android devices have clocks that are incorrect by more than *1 hour*, let 
> alone 1 minute.[4]
> 
> Network-set Android clocks are also routinely wrong by 15s, which is 25% of 
> the protocol's margin of error.
> 
> Failures due to clocks seem incredibly widespread amongst the small set of 
> Mozillians who've given FxA Sync a try. That is disheartening, but not 
> surprising.
> 
> 
> We're *requiring* clients to fail frequently in the course of normal 
> operation: on the first request (no known skew yet); on subsequent requests 
> if the clock is adjusted since the skew was computed; on subsequent syncs if 
> your network changes and your latency shifts (because our skew computation 
> doesn't try to model the network); when the server clock is automatically 
> corrected; etc.
> 
> 
> This whole process is fragile, provides a bad user experience (your first 
> sync is almost guaranteed to fail), and on an implementation level it is 
> apparently hard to get right (as the existence of [3], after we've landed our 
> skew handling, demonstrates).
> 
> 
> We know we still have low-level work to do: maybe persisting skew values 
> across restarts, doing better at modeling the environment to correct skews, 
> retrying in more places to allow for skew-driven failures.
> 
> But this seems like a bad choice of investment. Correcting for skew seems to 
> defeat some of the purpose of this timestamp validation: if you can intercept 
> a request from a client whose clock is wrong in the right direction, you can 
> save that token and use it later when the timestamp becomes valid, no? And 
> categorizing a large chunk of requests as routinely erroneous, forcing them 
> into error handling states, seems like a bad idea.
> 
> 
> What can we do to mitigate this problem? Ideas, many of which will no doubt 
> violate the promises that Hawk makes:
> 
> * Widen the validity window from 1 minute to 1 hour. Or six hours. Or three 
> days.
> * Do something non-conformant, like having clients pass their clock to the 
> server, eliminating the requirement for clients to manage skew.
> * Eliminate Hawk entirely, at least for the storage servers, switching the 
> output of the token server to be some kind of short-lived bearer token.
> * ???
> 
> More input, please!
> 
> -R
> 
> 
> 
> [0] https://bugzilla.mozilla.org/show_bug.cgi?id=957863
> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=962668, 
> https://bugzilla.mozilla.org/show_bug.cgi?id=929066
> [2] https://bugzilla.mozilla.org/show_bug.cgi?id=971059#c16
> [3] https://bugzilla.mozilla.org/show_bug.cgi?id=971059
> [4] 
> http://lbadri.wordpress.com/2013/09/01/Hawk-authentication-for-asp-net-web-api-using-thinktecture-identitymodel-45-replay-protection/
> [5] https://www.npmjs.org/package/Hawk
> [6] http://opensignal.com/reports/timestamps/
> _______________________________________________
> Sync-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/sync-dev
> 

_______________________________________________
Sync-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/sync-dev

Reply via email to