On 10/13/2017 02:06 AM, Frederik Ramm wrote: Hi,
there's a LOT of NHD:* (and nhd:*) tags on OSM objects, see https://taginfo.openstreetmap.org/search?q=NHD%3A - 1.9 million NHD:FCode, but also 188k "NHD:Permanent_" (note the underscore), 10k "NHD:WBAreaComI", or 1.5m "NHD:Resolution" just to grab a few. I haven't researched who added them and when, but they would certainly not clear the quality standards we have for imports today. Most of this information can be properly modelled in usual OSM tags, and where it cannot, it probably shouldn't be in OSM in the first place. Is there any systematic (or even sporadic) effort of cleaning up these old imports? Is there reason to believe that the neglect extends to more than just the tags - do geometry and topology usually work well on these, or are the funny tags a huge "this whole area hasn't had any love in a long time" sign? ON IRRELEVANT TAGGING: I, at least, ordinarily do not make a specific effort to ferret out irrelevant tags. For the most part, they're harmless to me. If some random object on the map haapens to have 'zqx3:identifier=2718281828' among its tags, the only real damage is the diffuse cost of shipping the data around. That said, you're quite right that such tags might indeed be a symptom of a neglected import, or one that was originally done with processes that wouldn't clear today's bar. Even that has only some bearing on the data that are meaningful. ON IMPORTING NHD: In the specific case of NHD, data quality varies by region. As Dave correctly notes, Alaska is uniformly atrocious. (There really are no good mapping data for Alaska. The technical challenge of acquiring high-quality data for much of the state simply is greater than the perceived value of the data.) Where I am, on the other had, NHD is actually quite good - in the maps that I render, which are almost all in rural areas, I use it. I most often use it in combination with OSM and with other data sources (USFWS national wetland inventory, Adirondack Park Authority wetland inventory, NYSDOT, ...) which give the rendered maps a somewhat 'cubist' appearance, but I find that appearance helpful - it's an indication of data variability, and gives me an idea how much uncertainty to expect in the field. The fact that NHD is often quite 'stale' does not bother me at all locally. I live in a heavily glaciated area, and the cities have been settled for quite a long time by US standards. Out in the countryside, the streams run typically in deep ravines, disproportionate to the size of the streams. They aren't moving anywhere. They most likely haven't moved significantly since the Wisconsinan glaciation, 14000 years ago. In the valleys, the detailed course of the streams does shift a bit, but in the cities and towns, the streams are engineered, and elsewhere, the terrain tends to be beaver swamp, and the streams shift with every move of the rodents or every major storm. I never expect the track of a watercourse within a wetland to be accurate, on any map, ever. NHD's topology is audited before it is released, so it's at least consistent (and likely correct). It's certainly hypothetically possible to map the streams using 'hand-crafted' methods - and I have done so for a few, when I've happened to follow them in wilderness travel. (I occasionally go hiking off-trail.) But the OSM community is never going to be able to do that for the great many watercourses that flow over my extremely well-watered area. There simply is too much land inhabited by too few people, most of whom are not well enough connected nor technologically literate enough to become OSM mappers. (Seriously, in some of these communities, there is no cell service and only a quarter of the houses have any sort of network connectivity. It's effectively working with Third World infrastructure.) It's virtually impossible to map most of these watercourses as an 'armchair mapper.' Our 'old second growth' timber gives rise to extraordinarily dense tree cover - denser than true 'old growth' forest. Even some fairly major watercourses - major enough that I wouldn't attempt to ford in springtime - are difficult or impossible to see in aerials. There has been an OSM project to map lakes and ponds in New York State, starting from point features giving their names. I've preserved these tracings in OSM, because I don't replace mappers' work with imports, ever. Nevertheless, I find them to be uniformly worse than NHD. They're usually quite rough, and in a great many of them, the mappers treated mats of floating or emergent vegetation as the shoreline, making shallow ponds much smaller than they are. For all these reasons, NHD is what I have in my area. I've never done a large-scale NHD import, and nobody else has done one around me. If I need a stream for a rendered map, and don't want the 'cubist' data, I sometimes import it as a single object from NHD. Where else will I get it? (That's pretty much my guideline on when importing is likely to add value: I as a data consumer have an identified use for most or all of what I'm bringing in, I have no ready way to acquire the information by mapping on the ground, and the external data set appears to be of good enough quality in the places that I have boots-on-the-ground mapping, and clean enough topology, that I can import without too much trouble. Nobody's reverted yet.) ON TAG RETENTION: When I import, I retain tags that are likely to be useful. Synthetic tags like 'area' I remove. I do occasionally retain tags that have the appearance of 'foreign keys' - but they are quite specific and I do ask mappers please to leave them alone. As an example, with the public land polygons that I've imported, I retain single unique ID's. I do repeat imports of those data sets, and I use the ID's in a semiautomated process for reconflation. (The reimported data are all checked manually, and I respect the work of mappers who've modified the import. The ID merely gives the script a starting point.) When importing single objects from NHD, I remove most of the rubbish but I do keep 'permanent_identifier'. (The 'PERMANENT_' tag is an artifact, coming from the fact that some intermediate database somewhere in the pipeline is limited to ten-character column names.) I also retain 'reachcode'. That string of digits is, according to USGS, guaranteed to be stable - they don't reuse them - and encodes information about the topology of a stream. I know some of the local codes quite well - I recognize at a glance that codes beginning with 02020005 refer to waterways that drain to the Atlantic by way of Schoharie Creek (a locally significant river despite the name, with dams, reservoirs, power stations), the Mohawk River, and the Hudson River. It's effectively a machine-readable 'second name' for the object. The other stuff, that doesn't map to OSM tagging, I discard. CONCLUSION Should irrelevant tagging on NHD objects be stripped? Maybe, although the diffuse cost of the database space and network bandwidth to retain and exchange it doesn't keep me awake at night. Please keep 'reachcode', though, I use that one! Should the presence of the irrelevant tagging cause the underlying objects to be removed? Please don't. The data aren't perfect, but we don't live in an ideal world. NHD's data quality is variable, but where it's good, it's very good, and even where it's bad, it's often better than anything else we're ever going to get our hands on. Are the data obsolete? Sure - but obsolete data about stable features are almost as good as up-to-date data about the same stable features. Would I import them wholesale today? Surely not. That's primarily because of the controversy that would ensue. I'm no stranger to controversy, but I respect the community consensus that large-scale imports are a matter of last resort, and forgo them where there are significanat technical counterarguments. (I ignore the arguments that are based solely on contentions that "imports are always bad for the community," or else I'd never import anything.)
_______________________________________________ Talk-us mailing list Talk-us@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-us