Hi all,

After reading about the issues with Scout and problems with name
expansion, I decided to do a little thinking on this issue.

The short answer is that Scout (as well as other text to speech
engines) should not need to be expanding values, ie E -> East. The
reason for this is that practice is error prone. We can (and largely
have) solved this problem through expanding names in the raw values.

A couple of years ago, I ran a bot that expanded hundreds of
thousands of names in OSM. The bot was very conservative about what it
expanded, which was good, but it also meant that we had a lot of names
that were left unexpanded- and as we know, if there are enough
exceptions, we have to treat those exceptions as the norm. That leads
to exactly the kind of problem Scout is experiencing now. From their
perspective, if there's a large number of unexpanded names, then they
have to do the expansion themselves, so I decided to look at the scope
fo the problem.

I took an extract of OSM of North America and looked at the last words
in road names and looked for words that looked like contractions I
found the following words that look like abbreviations:

Dr, Rd, St, Ave, Ln, Blvd, Cir, Pl and Hwy.

The total number of instance of finding one of those at the end of a
road name was 71100.

In addition, I found a bunch of directional prefixes that are probably
directional abbreviations, "NE", "NE", "SW" and "SE". There were
29,130 instances of those.

And then for "N", "S", "E", "W", there were a total of of 13,494

When you look at the total of these values, the number becomes pretty
scary- over 100,000 objects, which is just about 10% of all roads.
That means that if you're parsing OSM data, 1 in 10 roads you find
will have contractions. The numbers get worse when you realize  that
this analysis only covered the last word in a road name, not any other
word in it. I suspect the real number would be much higher.

I think we (the US OSM community) should try to make it easier for our
data consumers to work with our data by making it more consistent.

So here's what I'm thinking, and I'd really like a dialog about this:

1. I think for the first case, for "Rd" as the last word, we can be
reasonably sure that this is a contraction for "Road" and we can
expand it. This won't trigger the "Saint" problem because ot would
only trigger if St was the last word in the name.


2. I think we can do the same thing for NW/NE, etc. Those seem safe to me.

3. We could probably do the same type of expansion of NE, NW, SE, SW
if it's the first word in a name.

4. We work on some better ways of detecting these contractions and
decide what to do with them in the future (maybe find a way to expand
them automatically, maybe use MapRoulette, maybe use notes, etc.)

I'm not saying I'm going to do this. I'm not even officially proposing
it yet. I'm pointing out a problem and potential solution. My feeling
is that if we can drop the number of problems in our dataset by 90%
without much effort, then we should do that.

I really want to hear people's thoughts on this.

- Serge

_______________________________________________
Talk-us mailing list
Talk-us@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-us

Reply via email to