Hello Yves,

Thank you for sharing your thoughts.

I agree with you on the problem of importing locations in bulk. Still, I
think it is safe to use the API data to clean up the names and reference
numbers of the stations we have already mapped.

As far as I know, the API "name" values match exactly the names shown on
the stations' interactive displays, while the API "address" values match
the information reported on the Villo website. I don't know about the
app or other data sources.

So when you say that "what they present as the name" is unreliable, are
you talking about the data shown on the website, i.e., the "address"
value reported in the API? Then I agree that extracting address
information from those values is difficult because of the inconsistent
formatting.

However, as far as the name tags are concerned (name, name:fr, name:nl,
official_name), I think those are supposed to reflect the data as it is
visible on location. This means that the "name" API values should be our
source for that information. Those values are formatted more
consistently, too: with very few exceptions, the value is either "NNN -
NAME" or "NNN - FRENCH_NAME/DUTCH_NAME" where NNN is the station number,
and outliers are easy to spot. Is there any reason why we should not be
using that (or the same with the number removed) as official_name?

I actually have a spreadsheet where I converted all the "name" values
reported by the API to a properly-capitalized form and tried to fix all
the typos I could find. I will share it later.

Cédric

On 10/15/2017 06:57 PM, Yves bxl-forever wrote:
Hello,

In the past weeks I have also wanted to do some cleanup on Villo! stations and 
it’s a fact that there still quite a lot of work to be done.

Just a few thoughts about the idea of bulk data imports because this is what gave us 
really "ugly" nodes sometimes.

The name itself is a problem because what they present as the name is actually a string 
that concatenates the ID of the station, the name and its address.  This is why tagging 
this as "official_name" does not seem to make any sense.

Their JSON dataset usually looks like this:

"name":"076 - PLACE VAN MEENEN/VAN MEENENPLEIN",
"address":"PLACE VAN MEENEN/VAN MEENENPLEIN - AV PAUL DEJAER (FACE 35 - 39) / PAUL 
DEJAERLAAN (TEGENOVER 35 - 39)"


And we must translate it as such in our OSM nodes:

ref=76
name="Place Van Meenen - Van Meenenplein"
name:fr="Place Van Meenen"
name:nl="Van Meenenplein"
addr:street="Place Maurice Van Meenen - Maurice Van Meenenplein"
addr:housenumber="35-39"


It’s probably feasible to parse the fields automatically and make something 
that looks clean.
But I am not sure that the street name will always match (see example here, official name 
has "Maurice" somewhere and our parsing script will not guess it unless you 
feed it with a list of all streets).

About missing names in one language, this is tricky: normally we should stick 
with the official name given by the operator.  But another approach will be 
that if we know of an official translation (because it is the same name as the 
street or even a bus stop nearby, or a building) it should be used.  And I 
agree that we should fix typos without asking, like in your example.

Another problem is that the longitude and latitude fields must be checked to 
avoid putting stations in the middle of an intersection or inside a building.


In summary, I will recommend a safer approach, i.e. extracting a list of 
missing stations, and add them one by one manually, after checking whether the 
data looks fine.
But it will be nice to hear the thoughts of other members of the community.

Have a nice day.

Yves

_______________________________________________
Talk-be mailing list
Talk-be@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-be

Reply via email to