For some reason I got on a research jag about this topic last month (maybe to avoid my FOSS4G prep?). Anyhow, I sketched out some ideas on the storage/provision side here:
http://etherpad.com/I3dgOoyQKV Feel free to annotate / edit that if you like. The raw text is below for your interest. ---- The OpenTile Federation ======================= How can we create globally available image data sets, without committing to running centralized data centres? How can organizations place their imagery online in a way that it will both globally available, integrated with other imagery from other organizations, and retain good performance? We propose decentralized approach, in which any organization can easily set up an OpenTile server, which will join the OpenTile federation, and automatically share in the burden of providing access and redundant storage of global imagery and map tile sets. An Attempt: OpenAerialMap ------------------------- OpenAerialMap attempted a centralized solution to the problem of uniform access to a shared imagery cache. Servers were donated by a university and some basic web infrastructure was written that ingested images in a few basic formats and wrote the result out to a tile cache. It didn't work, fortunately for reasons that are mostly technical. The interest and enthusiasm generated even in a short period of time were quite high, particularly given the limitations of the platform. This is a problem which is amenable to an engineering solution. OpenAerialMap shut down a few months ago, and one of the reasons behind the shutdown was an issue of scale -- there is just too much image data it the world for one organization to take on the charitable work of hosting it for everybody. A secondary reason for the shutdown was the difficulty involved in getting new data in, given the required knowledge of raster imagery necessary to parameterize a new upload. We'll treat this problem as a second-order issue, since without infrastructure it's a moot point. We need a solution that allows people to easily donate bandwidth and storage space, without creating a coordination or institution problem. We don't need an OpenAerialFoundation to take in donations and run a data center, we need a way for people to bind their servers together into a federation, with minimal input. An Attempt: OpenStreetMap ------------------------- OpenStreetMap (OSM) is a successful attempt at data sharing, but the handling of actual tile service has been treated as an after-thought -- the primary goal of the organization is capturing and improving vector data. OpenStreetMap tiles are served out centrally, so the system has a single point of failure, and requires direct cash inputs to maintain. Thus far, demand for OpenStreetMap tiles has been mitigated by commercial organizations handling much of the distribution issues. CloudMade serves tiles based on OSM data. DeCarta has also recently entered the business. This is fine for converting a free resource (OSM vectors) into a proprietary service (CloudMade or DeCarta tiles), but not good for a resource like imagery, where the image tile itself is public and free. A New Approach -------------- We want a way for organizations to easily collaborate, for data to reside close to where it will be used, but still be part of a seamless global coverage of imagery. BigPeer2BigPeer ~~~~~~~~~~~~ A pure P2P solution is not practical, as the look-up costs and unpredicable latency will make tile delivery problematic. What we want is a way for BigPeers, with good local connectivity in their region, to easily set up a new node and begin servicing local clients, with very little technical or communication overhead. Similarly, if those BigPeers drop offline for various reasons, we want the network to re-balance, but not lose data in the process. Each BigPeer will set up an OpenTile server, with as much storage as they can provide to it. The server will join the network, be registered and pull over the tiles necessary to service the local clients in the area. How doest this work? See below. Global Addressing ~~~~~~~~~~~~ We want every tile and the metadata for every tile to be globally addressable and accessable. We want the history of every tile to also be globally addressable and accessable. Fortunately, the solution already exists, in a technology called a "distributed hash table" (DHT). DHT technology received a lot of research between 2000 and 2005 after the implosion of Napster and the rise of a number of different attempts at federated non-centralized systems. Using a DHT approach, a user could ask any OpenTile server for any tile in the world, and have that tile returned. By routing users to nearby servers, and having servers store tiles that are in their local area, the bandwidth requirements would be shared, and the performance improved over a centralized system. By spreading out the storage among servers, the hardware requirements would be shared. By replicating the tiles to multiple server, reliability would be guaranteed. Using a DHT, the coordinating function of the central institution could be greatly reduced. No huge storage arrays to maintain, no thick network cords to wire up. The central OpenTile web site would just maintain a manual for operating the OpenTile software, a copy of the code, downloadable binaries, a list of running OpenTile servers so new servers can join the federation, and a DNS system to map generic tile requests to random or nearby servers. Some tweaks to generic DHT technology can be used to make the OpenTile solution an even better fit. Unlike a classic DHT, in a tile DHT, the keys have meaning, they have a location. And the requests for tiles are likely to be well correlated with the tile locations themselves -- that is, people in Argentina are more likely to ask for Argentina tiles than people in NY, and vice versa. Using the user/tile affinity, tiles can be migrated to servers that are close to where the users are likely to request them. A further benefit of this approach is that it migrates tiles past international internet bottlenecks, into local national backbones (assuming an OpenTile server is set up locally). This makes an OpenTile solution an excellent fit for international development. Another tweak is to use DNS for server discovery, rather than rely on the DHT protocol itself. So if you want a tile, as a HTTP client (web browser, etc) the OpenTile DNS system will give you the geographically nearest server to talk to. That server will either provide you your tiles directly out of its cache, or fetch the tiles you need over the DHT network (and then locally cache them) if it doesn't already have them. Technical Details ----------------- Enabling technologies ~~~~~~~~~~~~~~~~ - Chimera, open source overlay network, binds servers together, maintains routing information, routes messages to keys, allows application level to handle details of local storage, replication, etc - Berkeley DB, open source disk hash, provides local key value store - GDAL, open source image format handler, provides format conversion where necessary - FastCGI, implement OpenTile as an FCGI and have HTTP layer handled elsewhere Open issues ~~~~~~~~~~~ - Object size, for 256x256 tiles is pretty small, perhaps make each key map to a 1024x1024 area, but transfer around a collection of 256x256 chunks in a message bundle (or just a JPEG-compressed, tiled TIFF, if we're feeling spunky), that way the server can easily serve standard tiles, but can replicate on the basis of larger chunks - Object management policy, Chimera delegates these "minor" details, like how to ensure that objects get replicated, that we still have redundancy in the system for a given object, that new servers get copies (which copies?) of objects - Some policy is natural, in that tile requests by clients should lead to local caching - Some policy is not natural, what is the rule for discarding objects when the cache approaches maximum size, a combination of redundancy (it's already well replicated) and locality (it's not close to me) might imply dropping an object Key layout ~~~~~~~~~~ - Chimera supports a 160 bit key, lots of room - First 64 bits can be the tile address, encoded in the "microsoft bing maps" manner, the first two bits indicate which quadrant of the top level (NW=0,NE=1,SW=2,SE=3), the next two bits indicate which quadrant of that quadrant, etc. This allows embedding tile number in a fixed length key, and means that spatially nearby keys will be numerically nearby, even across zoom levels. - 64 bit tile key translates to a ground resolution of 1cm per tile at the equator in mercator - Next 8 bits could be the "layer number" allowing for 256 global layers - A key combining the first 64 bits with a layer number and then zeroes can be the "canonical tile", the current active view of the tile - Next 8 bits can be the "metadata number" a space for storing information about the tiles in this tile location in this layer. Ask for "tile x/yz, layer n, metadata" and get back an XML document - XML document lists all the versions of tiles that have been added to this tile slot, what their key number is, who added them, what their effective resolution is, color/bw, and which one is the current canonical tile o some explanation of this idea: when a new tile is inserted into the system, it will have to be inserted with some standard metadata, indicating the effective image resolution, date of capture, user doing the insert, etc. The OpenTile server will apply the policy for the layer and decide: do I accept this tile at all? if so, give it a slot under this tile number. is it "better" than the current canonical tile? if so, also replace the canonical tile content with this tile content, update the metadata to include the information about the new tile. some layers, like imagery, will have pretty complex policies for canonical tile replacement, balancing currency and effective resolution. others like a rendered map view, might just take the latest view as canonical and discard all other copies (a reasonable policy for an OSM tile set, for example) - Initially there will be only one metadata number, but having room for others hopefully gives room for future proofing the problem of ever-growing metadata documents - Next 16 bits can be spare! - Next 32 bits can be random hash, so that historical tiles get a random slot for storage - Next 32 bits can be spare! - I've probably forgotten something, but the important point is that spatially nearby tiles sort together, that the metadata and historical tiles sort together, and that there is, in fact, metadata and the ability to maintain a history (and potential rollback path) of the layer Key affinity ~~~~~~~~~~~~ Messages in Chimera are routed to the host with the key value "nearest" to the message key. Server configuration can include the provision of a lat/lon coordinate, which can be converted into a key. Add some random noise at the bottom of the key, to keep the servers distinct, even when they are at the "same" location, and you have a server key to which messages can migrate that is spatially associated with (a) the location of the server and (b) the tiles that will end up there. Global Metadata ~~~~~~~~~~~~~~~ - There will want to be some global metadata stored in the network, like how the layer numbers map to human readable names, and insert policies for layers, and so on. What key that stuff is stored under needs to be worked out. - Possibly shove the whole tile key over two bits (we have resolution to burn) and use the top bit to flag global metadata. Give it a much wider replication policy. Interacting with the Server ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The server will be accessed using web services, specifically, each tile will be a writable URI - GET /layer/tilenumber o Where tilenumber is the bing maps key (other schemes can be supported, obviously) - GET /layer/tilenumber/tileversion o Where tileversion is a number returned in the metadata that allows access to the underlying historical tiles - GET /layer/tilenumber/metadata o Get a version of the XML metadata document, with the tile keys replaced with appropriate version numbers (the random hash portion extracted from the key) - POST /layer/tilenumber/new o Put a metadata fragment into the POST payload, if the metadata is judged valid, you get back a /layer/tilenumber/tileversion URL to which you can - PUT /layer/tilenumber/tileversion o Put the actual image data into place - DELETE /layer/tilenumber/tileversion What about overviews? Creating and uploading overviews will become the province of the data loading tools. The server will only operate tile-by-tile. Uploading Data ~~~~~~~~~~~~~~ A desktop GDAL utility will take in image files, request mandatory metadata (capture date, capture source, user id, what layer to upload to (initially there will be only two layers, the "global imagery latlon" and "global imagery mercator") and then get to work. Reproject raw data into layer projection, overlay the tile grid and clip out just those tiles fully contained in the data frame (we'll be throwing away some edge imagery, but we're trying to be brutalist and effective here). Up-sample for overviews and repeat. Upload each tile to the server, and generate a final report of which tiles were accepted and which ones weere rejected (some overviews will probably get rejected on the basis of a policy maintaining some global overview like bluemarble or landsat) Can we do this server-side? Yes, we could, as a separate project. The key here is to describe the mechanics of uploading, and the best practices (attempt to upload overviews as well as best resolution) and then any upload tool can populate the system. The infrastructure of storage and serving tiles is separate from the infrastructure of preparing and uploading. Distributing Query Load ~~~~~~~~~~~~~~~~~~~~~~~ The overlay network and cache management application layers should nicely distribute the data to appropriate localities. The next stop is to try and match the locality of users to locality of data. At the most basic level, having new OpenTile servers automatically register with a central DNS service would allow us to round-robin requests to national-level DNS aliases (ca.opentiles.org) to appropriate national servers (server1.ca.opentiles.org). Doing distribution at a DNS level seems preferable to any system of redirects, since the number of requests to be redirected would only grow and the redirect would add a uniform HTTP connection setup/teardown overhead to every request. _______________________________________________ talk mailing list [email protected] http://openaerialmap.org/mailman/listinfo/talk_openaerialmap.org
