Re: [Talk-se] Ortnamnsimport från Lantmäteriets GSD-Terrängkartan

Micke via Talk-se Fri, 17 Jan 2020 10:53:18 -0800

Hej!

För herrgårdar kanske man kan passa på att lägga till historic=manor samtidigt.
Det finns nog en del sådana herrgårdar inlagda redan. Men då ligger nog taggen 
på huset eller på gården, inte på en nod. Bara så att det inte blir dubbletter 
där.


Vi har ju även en hel del ställen som har ett namn, men där det är ödehus eller 
sommarstugor eller fäbodar. Dessa borde även de klassas som locality.

Stadsdelar bör väl inte vara hamlet, utan neighbourhood?


Mvh

Anders Andersson

Från: Grigory Rechistov <ggg_m...@inbox.ru>
Skickat: den 16 januari 2020 18:19
Till: talk-se <talk-se@openstreetmap.org>
Ämne: [Talk-se] Ortnamnsimport från Lantmäteriets GSD-Terrängkartan

Hej!
Jag har extraherat de ortnamn som nu saknas på Sveriges OSM-karta ifrån 
Lantmäteriets öppna data, daterade januari 2020. Det finns ungefär 95 tusen nya 
noder med namn och "place=*"-etiketter vilka jag så småningom hoppas ladda upp 
till OSM.

En såpass stor mängd nya data kräver att man följer vissa procedurer och 
förbereder vissa dokument. Jag hoppas att få er feedback och eventuell hjälp 
med valideringen, uppladdningen och med andra eventuella uppdrag.

Här finns importplan för projektet [1] på OSM-wikin. Den beskriver 
informationens härkomst, licens och format. Sedan beskriver jag hur de 
ursprungliga filerna bearbetas, hur nya punkter filtreras mot den befintliga 
OSM-databasen, hur ortnamn rensas och jämföras, vilka skript och program 
används vid alla steg osv. Till sist uppger jag vilka problem kvarstår att lösa 
under manuell bearbetning.

Importplanens bitar med viktigaste sektioner bifogar jag längst ner. Här är 
också en mindre bit av hela datasetet om du vill se hur det ska se ut: [2] [3]. 
Andra länkar till Lantmäteriets dokumentation, mina utvecklade skript, samtliga 
OSM-filer, kalkylblad osv finns på importplanens sida.

Tack!

[1] 
https://wiki.openstreetmap.org/wiki/Import/Catalogue/Lantm%C3%A4teriet_GSD-Terr%C3%A4ngkartans_ortnamnsimport
[2] https://drive.google.com/open?id=1np1TEDlEBWx1kt-u7A4Z_ZpkMOwOp80l
[3] https://drive.google.com/open?id=1pERx-U4rdOjhXmePoSxcbKRZsr-preh8

Importplanens utdrag följer.

===Goal===
To improve OSM completeness for toponymical dataset on territory Sweden using
an official map supplied by Swedish mapping, cadastral and land registration 
authority.
This import considers OSM data representable as nodes tagged with usual
key/value pairs: "place=city", "place=town", "place=village", "place=hamlet",
"place=isolated_dwelling", and "place=locality". However, it is not planned
(but not fully excluded either) to add/modify any nodes with "city" and "town"
values. They are expected to be already fully mapped.

==== Data processing diagram ====
See the diagram below. The conflation stage is described later in more details.
+-------------------+        +------------------+
|                   |        |                  |
|Lantmäteriet's SHP |        |Geofabrik country |
|files              |        |extract           |
|                   |        |                  |
+---------+---------+        +--------+---------+
          |                           |
          |ogr2osm                    |osmconvert
          |                           |osmfilter
          v                           v
 +--------+---------+         +-------+---------+
 |                  |         |                 |
 |OSM file with     |         |OSM fiele with   |
 |settlements       |         |settlements      |
 |                  |         |                 |
 +---------+--------+         +-------+---------+
           |                          |
           |                          |
           |     conflate-places.py   |
           +<--------------------------
           v
  +--------+--------+
  |                 |
  |OSM file with    |
  |only ready nodes |
  |                 |
  +--------+--------+
           |
           | Manual corrections
           |
           v
    Upload to JOSM


The employed algorithm operates on a set of old nodes marked with "place=*"
(from the OSM-extract, around 68 000 nodes for the country) and new nodes
(from SHP-extract). It produces ready nodes — a strict subset of new nodes.
No old nodes are modified in any way during the process. This means that 
existing
data has absolute priority, even in cases it is likely of lower quality than
new data.
The sequence of steps is as following.
1. Create a spatial index structure with old nodes to have fast spatial lookup.
2. For all new nodes validation/correction of the "name" tag is performed.
3. For each new node, find old nodes close enough to it to be candidate for 
duplicates.
4. For each candidate node, compare its name against the current new node name.
   Comparison is fuzzy to allow for some text variation typical for names.
   Alternative old names are also checked if present.
5. If a name match is found, the current new node is marked as "duplicate" and
   is excluded from further analysis and results.
6. An OSM file with ready data is generated.
7. The OSM file is optionally split into smaller tiles to ease and speed up
   visual validation.

===Expected issues and their risk assessment===

So far, the most problematic issues seems to be "A duplicate of
existing node is added" and "A new node is added with incorrect position".
It is expected that to to discover and fix such problems would require most of
required manual editing.


Med vänliga hälsningar,
Grigory Rechistov
With best regards,
Grigory Rechistov

_______________________________________________
Talk-se mailing list
Talk-se@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-se