On Mon, Dec 3, 2012 at 3:35 PM, Uncle Zzzen <unclezz...@gmail.com> wrote: > Hi. > I'm busy with work lately, but there's a discussion Zooko and I were having > on a closed ticket, and I agree with him it actually belongs here, so here > goes: > > Sometimes there's a need to expose a [partial view of a] Tahoe-LAFS storage > as a public web service. As far as I understand, there are 3 ways to do it. > > 1) Gateway to web api - the public server proxies requests to Tahoe's web > api, blocking undesired requests (e.g. POST ones). This is what lafs-rpg > does (using nginx). You can also tweak it to in various ways (e.g. disable > directory browsing under some subtree). > > 2) Static web server, file-system back-end - use a standard static html web > server (apache, nginx, etc.) and let it serve files from a fuse-mounted > Tahoe-LAFS cap. In the future, once we have "dropbox-like functionality", it > would enable us to serve static files from a "magically synced" file-system > folder, and we won't even need the fuse trickery. > > 3) Dedicated service - Tahoe-LAFS can have, in addition to the web api, a > public web service (listening on a different port). We would need to define > the various mountpoints this server has (e.g. map /blog/ to > /uri/DIR-RO:.../Latest/), and additional configuration options (basic/other > auth, mustache/jinja2/etc. template for directory browsing if allowed, > etc.). We can either do all that explicitly at tahoe.cfg, or simply specifiy > a capability where this config (probably json) is read from (handy if you > want to remotely configure such a server, but might be vulnerable for > exactly the same reason). > > Option 1 is what I use at the moment. It may not be a pretty sight, but it > ain't broke (AFAIK) so I don't have an urge to fix it. > Zooko prefers option 3. I agree this could be neat. > What's your opinion? >
I see two different aspects of a public gateway mentioned here. One aspect is the architecture of the components, and the other is policy around mapping public web urls to tahoe capabilities. The architectures mentioned are: 1. "http proxy": web browser -> http server-side proxy (like nginx) -> tahoe gateway -> tahoe grid 2. "fuse proxy": web browser -> http static filesystem server -> filesystem -> fuse process -> tahoe gateway -> tahoe grid 3. "static file server": web browser -> http static filesystem server -> filesystem <- external sync process (dropbox-like) -> tahoe gateway -> tahoe grid Notice that "external syn process" points left towards the filesystem instead of right. The requests/responses are decoupled and the http server and dropbox-like process asynchronously read or update the filesystem. 4. "built-in web server": web browser -> as-of-yet-unimplemented tahoe "public mode" gateway -> tahoe grid. Just by counting arrows, it's obvious that 4, the built-in web server would be the "leanest" approach in terms of fewest "hops", so this might be most efficient. The trade-off of fewer "hops" is less separation of concerns. For example, in approach 1, an nginx proxy might terminate SSL, and it *may* be that because nginx is very popular, if there were security-related bugs in the SSL server side in nginx, they'd be found quickly and fixed, whereas if a future version of tahoe has a built-in web-server that also terminates SSL, then it may have a smaller user base and security bugs may be less likely to be noticed. Taking the approach of 4, a tahoe-specific public web server could use more "local information" to possibly make better, more accurate decisions. For example, it might be able to make smarter caching decisions. IMO, architecture 2, a fuse proxy, is less attractive than 1, the http proxy. One reason is that in 1, the first two hops are both http requests, and http is already proxy-friendly. OTOH, in 2, an HTTP proxy is translated into a set of filesystem requests, which are then translated back into requests to a tahoe gateway, so there's some impedence mismatch. For example, the tahoe gateway might have useful caching information expressed in a standard http manner, which nginx could handle, but which would be lost (probably) by a fuse layer. Also, the fuse interfaces I'm aware of speak to the gateway over http anyway, so there are more hops. Architecture 3 is appealing because the left hand side of the || is simple and well understood: It's just a static web server. The only difference is in how the content may be updated. So those are the architectural considerations. The policy considerations seem separable to me. In any architecture the site operator may choose to carry capabilities all the way through, hide capabilities behind well known URL paths, or handle directory requests differently than file requests. I personally am interested in the idea that a public web interface operator is not directly aware of the content or the publishers of data being served. This scenario is similar to tor2web. To support that case, the simplest approach seems to just pass capabilities through from the tahoe grid to the public web, and to rely on publishers to share their capabilities out of band (similar to tor2web). Of course, another use case that seems popular is to have a centrally controlled "site" that looks very much like any other website from the outside, except whose storage is backed by a tahoe grid. Even when the capabilities are all hidden at the public proxy layer, this architecture is importantly different from a security standpoint because of provider independent security. If we contrast that architecture to a traditional architecture with a web server connected to a database layer, it's interesting that the "database" equivalent need not be trusted by the web server beyond availability. If malcontents break into a storage grid machine (or even all of them), they can wreak much less havoc than if they break into a traditional website database. Likewise, if they break into the public-facing web proxy, then can intercept and modify contents on the way out, but anyone with access to the grid can still see the legitimate content and updates. > Cheers, > The Dod > > _______________________________________________ > tahoe-dev mailing list > tahoe-dev@tahoe-lafs.org > https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev > Regards, Nathan _______________________________________________ tahoe-dev mailing list tahoe-dev@tahoe-lafs.org https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev