My 2 cents.

I reviewed a script for expanding names a year or two ago (yours I think),
and thought it worked very well.  I do not recall finding any false
positives.  If we have over 100,000 new names that it would catch then it
sounds like we need to run it on a regular basis.  Or are there so many
because the script only ran on tiger entries by default?

I say retest the script detection and use it regularly for the start/end
abbreviations.  Expand it to other detection if it can be shown to be
reliable.
Throw in a semi regular MapRoulette task to look for more complicated or
vague abbreviations and the problem can be kept under control.  This can
also be used to verify advanced scripts reliability by offering a suggested
expansion on the harder names.

If there is still any concern of incorrect expansion, output logs with
old/new names, links to the ways and a link for revert and tag not to
expand in the future.  MapRoulette should also have a checkbox to block
further expanding a name in future checks.


Dale Puch


On Sat, Jul 19, 2014 at 12:27 PM, Serge Wroclawski <emac...@gmail.com>
wrote:

> Hi all,
>
> After reading about the issues with Scout and problems with name
> expansion, I decided to do a little thinking on this issue.
>
> The short answer is that Scout (as well as other text to speech
> engines) should not need to be expanding values, ie E -> East. The
> reason for this is that practice is error prone. We can (and largely
> have) solved this problem through expanding names in the raw values.
>
> A couple of years ago, I ran a bot that expanded hundreds of
> thousands of names in OSM. The bot was very conservative about what it
> expanded, which was good, but it also meant that we had a lot of names
> that were left unexpanded- and as we know, if there are enough
> exceptions, we have to treat those exceptions as the norm. That leads
> to exactly the kind of problem Scout is experiencing now. From their
> perspective, if there's a large number of unexpanded names, then they
> have to do the expansion themselves, so I decided to look at the scope
> fo the problem.
>
> I took an extract of OSM of North America and looked at the last words
> in road names and looked for words that looked like contractions I
> found the following words that look like abbreviations:
>
> Dr, Rd, St, Ave, Ln, Blvd, Cir, Pl and Hwy.
>
> The total number of instance of finding one of those at the end of a
> road name was 71100.
>
> In addition, I found a bunch of directional prefixes that are probably
> directional abbreviations, "NE", "NE", "SW" and "SE". There were
> 29,130 instances of those.
>
> And then for "N", "S", "E", "W", there were a total of of 13,494
>
> When you look at the total of these values, the number becomes pretty
> scary- over 100,000 objects, which is just about 10% of all roads.
> That means that if you're parsing OSM data, 1 in 10 roads you find
> will have contractions. The numbers get worse when you realize  that
> this analysis only covered the last word in a road name, not any other
> word in it. I suspect the real number would be much higher.
>
> I think we (the US OSM community) should try to make it easier for our
> data consumers to work with our data by making it more consistent.
>
> So here's what I'm thinking, and I'd really like a dialog about this:
>
> 1. I think for the first case, for "Rd" as the last word, we can be
> reasonably sure that this is a contraction for "Road" and we can
> expand it. This won't trigger the "Saint" problem because ot would
> only trigger if St was the last word in the name.
>
>
> 2. I think we can do the same thing for NW/NE, etc. Those seem safe to me.
>
> 3. We could probably do the same type of expansion of NE, NW, SE, SW
> if it's the first word in a name.
>
> 4. We work on some better ways of detecting these contractions and
> decide what to do with them in the future (maybe find a way to expand
> them automatically, maybe use MapRoulette, maybe use notes, etc.)
>
> I'm not saying I'm going to do this. I'm not even officially proposing
> it yet. I'm pointing out a problem and potential solution. My feeling
> is that if we can drop the number of problems in our dataset by 90%
> without much effort, then we should do that.
>
> I really want to hear people's thoughts on this.
>
> - Serge
>
> _______________________________________________
> Talk-us mailing list
> Talk-us@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk-us
>
_______________________________________________
Talk-us mailing list
Talk-us@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk-us

Reply via email to