On Mon, Jul 13, 2009 at 12:49 AM, Jonas Sicking<jo...@sicking.cc> wrote: > 2. How do we deal with identifying libraries. > As Aaron Boodman pointed out, SHA hashes means that you can't make > upgrades for security problems etc.
I think a hash would be fine. You don't want to load a different version of the library than the author intended, even if it may supposedly have bug fixes. Perhaps the author is relying on the bugs that were fixed. This should be transparent, so that the same script will always be used whether browsers support this feature or not. > 3. Compat when the browser doesn't have a library cached. > A solution that includes using a different uri for browsers that do > support caching than browsers that don't is scary since there is a big > risk that the author will forget to update one but not the other. How about you have an extra HTTP header like "X-Content-Hash"? This could provide a SHA256 hash (or something else that looks safe for now, progressively upgradeable) of the content. The browser can keep its cached copies of these files indexed by hash. If it tries downloading a file, and notices that the hash is the same as a file already downloaded, it can terminate the HTTP connection and use the existing file (even if it's from a different site). It will then proceed as though it had actually downloaded the file: e.g., it will respect the Expires headers separately (two sites might serve the same file but have different expectations about how likely it is to change). Of course, when the file is downloaded, it would be indexed based on actual SHA256, not the reported SHA256. This way you can't poison the cache. If an incorrect hash is provided, that will potentially result in the wrong script being used, of course, but you can't generate a preimage of the bad hash (hopefully!), so attackers can't exploit it. Unless they can inject HTTP headers, but then you're screwed anyway. So this would basically be like ETag, but with the tag unique across all sites. In fact, maybe piggybacking it on ETag would be a better idea, so you don't have to do stuff like terminate the TCP connection unexpectedly. (Which I don't *think* would be harmful? but is certainly suboptimal.) Just define a magic ETag format that nobody's currently using. I'm not sure how useful this would actually be, though. Are there really *so* many sites using the *exact* same version of jQuery or any other single library? It seems like a better idea would be to get all the valuable user-created APIs standardized and implemented in some form cross-browser, as a few (e.g., getElementsByClassName) have been. Then the common uses of jQuery or whatever would just be compatibility layers for legacy browsers. Kind of like how a bunch of Boost is getting added to standard C++. Of course, this requires more effort, but it's more likely to actually work, in the long term. > I believe all these problems are solvable, but they do need to be > solved before considering inclusion in any spec. I'm also not > convinced this needs to be included in the HTML5 spec, but that might > depend on what the ultimate solution looks like. The most obvious place to solve this seems to be HTTP, not HTML. HTTP is closer to the resource itself. If you do something with HTML, like an extra <link> attribute, then you're going to get authors updating the HTML but not the thing it points to or vice versa. An ETag-like solution would be implemented either in the web server or whatever script is serving the content, and those should always know whether the file has changed. (Modulo pathological behavior like something changing the file and then forging the mtime/ctime.) So yeah, doesn't seem like something for HTML 5.