On Sun, Nov 22, 2020 at 8:04 PM Brian M. Sperlongano <zelonew...@gmail.com> wrote:
> Therefore, a holistic solution is needed for large objects. Setting an > api limit is good because it gives consumers a guarantee about the > worst-case object they might have to handle. However, it must also be > combined with a replacement mechanism for representing large objects. The > 2,000 node limit for a way is fine because longer ways can be combined via > relations. If the relation member limit were capped, you create a class of > objects that cannot be represented in the data set. > We've already substantially solved that problem for routes. Super-relations seem to work well, and only rarely do we even need a three-level hierarchy. As Steve points out, we could go deeper, but there's no need. > > What I think is missing is a way to store huge multipolygons in such a way > that they can be worked with in a piecemeal way. The answer that > immediately comes to mind is a scheme where large objects are represented > as relations of relations, where portions of a huge multipolygon are > chopped up into fragments and stored in subordinate multipolygon > relations. This hierarchy could perhaps nest several levels if needed. > Now a 40,000 member relation could be composed of 200 relations of 200 > members each, with each subordinate relation member being a valid > multipolygon with disjoint or adjacent portions of the overall geometry. > > Then, an editor could say "here is a large relation, I've drawn bounding > boxes for the 200 sub-relations, if you select one, I'll load its data and > you can edit just that sub-relation". > > This could *almost* work under the current relation scheme (provided new > relation types are invented to cover these types of data structures, and > consumers roger up to supporting such hierarchical relations). The thing > that makes this fail for interactive data consumers (such as an editor or a > display) is that *there's no way to know where relation members are, > spatially, within the relation*. The api does not have a way to say > "what is the bounding box of this object?" A consumer would need to > traverse down through the hierarchy to compute the inner bounding boxes, > which defeats the purpose of subdividing it in the first place. > You're right that it's a problem, but you misdiagnose the details. Rather than identifying bounding boxes, which is easy, the problem comes down to identifying topology - is a given point in space on the inside or outside of the multipolygon? The minimal information needed when that question is asked is one of two things. You need to know either the 'winding number' - essentially, if you draw a mathematical ray from the point to infinity in a given direction, how many times do you cross the boundary of the region? (Odd = inside, even = outside). The second is to add a requirement to the data model that the boundaries of regions must follow a particular winding direction; most GIS systems use the "right hand rule" of specifying that as you proceed along a boundary way, the interior of a relation should be on your right. The second rule is by far the easiest to implement. Unfortunately, it's also inconsistent with OSM's base data model. The problem is that we do not necessarily require multipolygons to be sorted in any particular order (depending on client software to order them if necessary), nor do we require the boundary ways to proceed in any particular direction with respect to the multipolygon. In fact, we cannot require the boundary ways to proceed in a particular direction, since shared ways between adjacent multipolygons are a fairly common practice. The practice is somewhat controversial; nevertheless, it seems like a good idea when the adjoining regions by their nature are both known to touch and known to be mutually exclusive. The lines that separate landuse from landuse, landcover from landcover, administrative region from administrative region, land from water, or cadastral parcel from cadastral parcel (where cadastre is accepted, as it is with objects like public recreational land). Except for monsters such as the World Ocean (the coastline is a perpetual headache), seas, and objects with extremely complex topology, the problem is somewhat manageable. A single 'ring' (the cycle of contiguous ways, inner or outer, that form one region of a multipolygon) or a single 'complex polygon' (an outer way and any inner ways subordinate to it) are generally quite manageable in terms of data volume. I can edit shorelines of the Great Lakes, for instance, with some confidence, by loading into JOSM all the data near the single stretch of shoreline that I'm working on, plus the entire outer perimeter of the lake (using the 'download incomplete members' function); having the shoreline outside the immediate region of interest doesn't stress the memory even of a somewhat obsolete laptop computer. Not all editors are as competent with managing large relations - I've never, for instance, grown comfortable with attempting similar tasks in any of the browser-based ones I've tried. I used Meerkartor briefly during a time when the large relations were causing random JOSM crashes (something to do with interactions with accessibility extensions when painting the data in the UI), and is was also fairly workable, so this isn't a JOSM advertisement, necessarily. The objects that typically give me the worst headaches aren't necessarily the largest ones - as I said, I deal with long routes such as the Appalachian Trail, or large areas such as the Great Lakes - but rather the diffuse ones. (Many National Forests are both!) Editing messy multipolygon like https://www.openstreetmap.org/relation/6360587 - particularly one where the ways are shared with other objects (as where a recreation area shares boundaries with an adjacent wilderness area, or is defined by a shoreline or a stream centerline) - is, as an elderly relative of mine used to put it, "a pain where a pill don't fix it!" I do not agree at all with the contention that nothing is lost by breaking the association among the individual fragments of such a diffuse area. They share a name, an administrative authority, a management plan, a web site, a set of regulations, and so on. They are the parts of a whole that happens to be fragmented into a lot of spatially disjoint, although loosely grouped, pieces. I do understand that "relations are not categories" but I'm not trying to create a relation for "all Wild Forest areas" or "all New York State lands", but rather for the particular facility known as the "Wilcox Lake Wild Forest." The neighbours and visitors of that forest do conceptualize it as a single thing, so we do lose a lot if you tell me "just don't map that way." Extracting a geographic region from a large multipolygon for rendering is somewhat a solved problem, although implementations in particular tools vary. There are a number of named algorithms related to the issue. Wikipedia offers some good jumping-off points: Sutherland-Hodgman: https://en.wikipedia.org/wiki/Sutherland%E2%80%93Hodgman_algorithm Weiler-Atherton: https://en.wikipedia.org/wiki/Weiler%E2%80%93Atherton_clipping_algorithm Greiner-Hormann: https://en.wikipedia.org/wiki/Greiner%E2%80%93Hormann_clipping_algorithm Vatti: https://en.wikipedia.org/wiki/Vatti_clipping_algorithm (see also https://en.wikipedia.org/wiki/Bentley%E2%80%93Ottmann_algorithm) They work quite well in practice for rendering and geocoding in limited geographic areas. The spatial indexing of the relational databases we use also performs well in practice except for the case where the region is both large and topologically complex. The key issue for editing is that edits must ensure topologic consistency. Most proposals that I've seen for representing large multipolygons by subdivision fail at this - they require the entire multipolygon to identify that the portion being edited does not introduce crossing ways or disconnect the boundary. This is the perennial problem with the coastline - it's never complete and consistent, so the generalization of the coastline never seems to happen. Apologies to the 'tagging' mailing list in that I'm wandering off into data storage, data retrieval, editing and rendering technology, none of which really bears on how the objects are mapped and tagged. There's almost certainly a better forum in which to hash out design details of a data model that addresses Brian's issue satisfactorily, and I'll happily follow to wherever the discussion of the technological problems moves. -- 73 de ke9tv/2, Kevin
_______________________________________________ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging