Re: [whatwg] Persistent storage is critically flawed.

Shannon Baker Sun, 27 Aug 2006 23:09:29 -0700

Ian Hickson wrote:


This is mentioned in the "Security and privacy" section; the third
bullet point here for example suggests blocking access to "public"
storage areas:

  http://whatwg.org/specs/web-apps/current-work/#user-tracking

I did read the suggestions and I know the authors have given theseissues thought. However, my concern is that the solutions are all'suggestions' rather than rules. I believe the standard should be moredefinitive to eliminate the potential for browser inconsistencies.

Yes, there's an entire section of the spec discussing this in detail,
with suggested solutions.

Again, the key word here is 'suggest'.

Indeed, the spec suggests blocking such access.

Suggest. See where I'm going with this. The spec is too loose.

There generally is; but for the two cases where there are not, see:

  http://whatwg.org/specs/web-apps/current-work/#storage

...and:

  http://whatwg.org/specs/web-apps/current-work/#storage0

Basically, for the few cases where an author doesn't control his
subdomain space, he should be careful. But this goes without saying.
The same requirement (that authors be responsible) applies to all Web
technologies, for example CGI script authors must be careful not to
allow SQL injection attacks, must check Referer headers, must ensure
POST/GET requests are handled appropriately, and so forth.

As I pointed out this only gives control to the parent domain, not thechild without regard for the real-world political relationship betweenthe two. Also the implication here is that the 'parent' domain is moretrustworthy and important than the child - that it should always be ableto read a subdomains private user data. The spec doesn't give thedeveloper a chance to be responsible when it hands out user data toanybody in the domain hierarchy without regard for whether they are asingle, trusted entity or not. Don't blame the programmer when the specdictates who can read and write the data with no regard for the authorspreferences. CGI scripts generally do not have this limitation so youranalogy is irrelevant.

Indeed; users are geocities.com shouldn't be using this service, and
geocities themselves should put their data (if any) in a private
subdomain space.

Geocities and other free-hosting sites generally have a low server-sidestorage allowance. This means these sites have a _greater_ need forpersistent storage than 'real' domains.

It doesn't. The solution for mysite.geocities.com is to get their owndomain.

That's a bit presumptuous. In fact it's downright offensive. The usermay have valid reasons for not buying a domain. Is it the whatcg's roleto dictate hosting requirements in a web standard?

The spec was written in conjunction with UA vendors. It is realistic
for UA vendors to provide a hardcoded list of TLDs; in fact, there is
significant work underway to create such a list (and have it be
regualrly updated). That work was originally started for use for HTTP
Cookie implementations, which have similar problems, but would be very
useful for Storage API implementations (although, again as noted in
the draft, not imperative for a secure implementation if the author is
responsible.

I accept that such a list is probably the answer, however I believe thelist should itself be standardised before becoming part of a webstandard - otherwise more UA inconsistency.

One could create much more complex APIs, naturally, but I do not see
that this would solve the problems. It wouldn't solve the issue of
authors who don't understand the security implications of their code,
for instance. It also wouldn't prevent the security issue you
mentioned -- why couldn't all *.geocities.com sites cooperate to
violate the user's privacy? Or *.co.uk sites, for that matter? (Note
that it is already possible today to do such tracking with cookies; in
fact it's already possible today even without cookies if you use
Referer tracking, and even without Referer tracking one can use IP and
User-Agent fingerprinting combined with log analysis to perform quite
thorough tracking.)

None of those techniques are reliable. My own weblogs show most usershave the referer field turned off. Cookies can be safely deleted afterevery session without a major impact on site function (I may have tologin again). IP tracking is mitigated by proxies and NAT's. The troublewith this proposal is that it would allow important data to get lumpedin with tracking data when the spec suggests that UA's should onlydelete the storage when explicitly asked to do so. I don't have asolution to this other than to revoke this proposal or prevent thesharing of storage between sites. I accept tracking is inevitable but weshouldn't be making it easier either.

Certainly one could add a .readonly field or some such to storage data
items, or even fully fledged ACL APIs, but I don't think that should
be available in a first version, and I'm not sure it's really useful
in later versions either.

Any more or less complex or useful than the .secure flag? Readonly is anessential attribute in any shared data system from databases tofilesystems. Would you advocate that all websites be world-writable justto simplify the API? Not that it should be hard to implement .readonly,as we already have metadata with each key.

I don't really understand what this is referring to. Could you show an
example of the transaction/callback system you refer to? The API is
intended to be really simple, just specify the item name and there you
go.

I'm refering to the "storage" event described in 5.9.6 which is fired inall active pages as data changes. This is an unusual proceedure thatneeds a better justification than those given in the spec. If the eventpulls me out of my current function then how am I going to do anythinguseful with the application state (without really knowing whereexecution was interrupted)?

While I agree that there are valid concerns, I believe they are all
addressed explicitly in the spec, with suggested solutions.

You points are also quite valid however they ignore the root of myconcerns - which is that the spec leaves too much up to the UA toresolve. I don't see how you can explicitly define something with asuggestion! The whole spec kind of 'hopes' that many disparatecompanies/groups will cooperate to make persistent storage workconsistently across browsers. They might, but given both Microsoft andNetscapes track records I think things need to be more concrete in suchan important spec.

I would be interested in seeing a concrete proposal for a better
solution; I don't really see what a better solution would be.

I'm not sure myself but I don't think it can stay the way it is. I wouldbe happy to offer a better proposal or update the current one givenenough time to consider it.

As a quick thought, the simplest approach might just be to require thesite send a secret hash or public key in order to prove it 'owns' thekey. The secret could even be a timestamp of the exact time the key wasset or just a hash of the users site login. eg:


DOMAIN         KEY          SECRET                                 DATA
foo.bar              baz             kj43h545j34h6jk534dfytyf      A string.

Just one idea.

Shannon
Web Developer

Re: [whatwg] Persistent storage is critically flawed.

Reply via email to