Hello Yves, Thank you for sharing your thoughts.
I agree with you on the problem of importing locations in bulk. Still, I think it is safe to use the API data to clean up the names and reference numbers of the stations we have already mapped. As far as I know, the API "name" values match exactly the names shown on the stations' interactive displays, while the API "address" values match the information reported on the Villo website. I don't know about the app or other data sources. So when you say that "what they present as the name" is unreliable, are you talking about the data shown on the website, i.e., the "address" value reported in the API? Then I agree that extracting address information from those values is difficult because of the inconsistent formatting. However, as far as the name tags are concerned (name, name:fr, name:nl, official_name), I think those are supposed to reflect the data as it is visible on location. This means that the "name" API values should be our source for that information. Those values are formatted more consistently, too: with very few exceptions, the value is either "NNN - NAME" or "NNN - FRENCH_NAME/DUTCH_NAME" where NNN is the station number, and outliers are easy to spot. Is there any reason why we should not be using that (or the same with the number removed) as official_name? I actually have a spreadsheet where I converted all the "name" values reported by the API to a properly-capitalized form and tried to fix all the typos I could find. I will share it later. Cédric On 10/15/2017 06:57 PM, Yves bxl-forever wrote:
Hello, In the past weeks I have also wanted to do some cleanup on Villo! stations and it’s a fact that there still quite a lot of work to be done. Just a few thoughts about the idea of bulk data imports because this is what gave us really "ugly" nodes sometimes. The name itself is a problem because what they present as the name is actually a string that concatenates the ID of the station, the name and its address. This is why tagging this as "official_name" does not seem to make any sense. Their JSON dataset usually looks like this: "name":"076 - PLACE VAN MEENEN/VAN MEENENPLEIN", "address":"PLACE VAN MEENEN/VAN MEENENPLEIN - AV PAUL DEJAER (FACE 35 - 39) / PAUL DEJAERLAAN (TEGENOVER 35 - 39)" And we must translate it as such in our OSM nodes: ref=76 name="Place Van Meenen - Van Meenenplein" name:fr="Place Van Meenen" name:nl="Van Meenenplein" addr:street="Place Maurice Van Meenen - Maurice Van Meenenplein" addr:housenumber="35-39" It’s probably feasible to parse the fields automatically and make something that looks clean. But I am not sure that the street name will always match (see example here, official name has "Maurice" somewhere and our parsing script will not guess it unless you feed it with a list of all streets). About missing names in one language, this is tricky: normally we should stick with the official name given by the operator. But another approach will be that if we know of an official translation (because it is the same name as the street or even a bus stop nearby, or a building) it should be used. And I agree that we should fix typos without asking, like in your example. Another problem is that the longitude and latitude fields must be checked to avoid putting stations in the middle of an intersection or inside a building. In summary, I will recommend a safer approach, i.e. extracting a list of missing stations, and add them one by one manually, after checking whether the data looks fine. But it will be nice to hear the thoughts of other members of the community. Have a nice day. Yves
_______________________________________________ Talk-be mailing list Talk-be@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-be