Hi all, After reading about the issues with Scout and problems with name expansion, I decided to do a little thinking on this issue.
The short answer is that Scout (as well as other text to speech engines) should not need to be expanding values, ie E -> East. The reason for this is that practice is error prone. We can (and largely have) solved this problem through expanding names in the raw values. A couple of years ago, I ran a bot that expanded hundreds of thousands of names in OSM. The bot was very conservative about what it expanded, which was good, but it also meant that we had a lot of names that were left unexpanded- and as we know, if there are enough exceptions, we have to treat those exceptions as the norm. That leads to exactly the kind of problem Scout is experiencing now. From their perspective, if there's a large number of unexpanded names, then they have to do the expansion themselves, so I decided to look at the scope fo the problem. I took an extract of OSM of North America and looked at the last words in road names and looked for words that looked like contractions I found the following words that look like abbreviations: Dr, Rd, St, Ave, Ln, Blvd, Cir, Pl and Hwy. The total number of instance of finding one of those at the end of a road name was 71100. In addition, I found a bunch of directional prefixes that are probably directional abbreviations, "NE", "NE", "SW" and "SE". There were 29,130 instances of those. And then for "N", "S", "E", "W", there were a total of of 13,494 When you look at the total of these values, the number becomes pretty scary- over 100,000 objects, which is just about 10% of all roads. That means that if you're parsing OSM data, 1 in 10 roads you find will have contractions. The numbers get worse when you realize that this analysis only covered the last word in a road name, not any other word in it. I suspect the real number would be much higher. I think we (the US OSM community) should try to make it easier for our data consumers to work with our data by making it more consistent. So here's what I'm thinking, and I'd really like a dialog about this: 1. I think for the first case, for "Rd" as the last word, we can be reasonably sure that this is a contraction for "Road" and we can expand it. This won't trigger the "Saint" problem because ot would only trigger if St was the last word in the name. 2. I think we can do the same thing for NW/NE, etc. Those seem safe to me. 3. We could probably do the same type of expansion of NE, NW, SE, SW if it's the first word in a name. 4. We work on some better ways of detecting these contractions and decide what to do with them in the future (maybe find a way to expand them automatically, maybe use MapRoulette, maybe use notes, etc.) I'm not saying I'm going to do this. I'm not even officially proposing it yet. I'm pointing out a problem and potential solution. My feeling is that if we can drop the number of problems in our dataset by 90% without much effort, then we should do that. I really want to hear people's thoughts on this. - Serge _______________________________________________ Talk-us mailing list Talk-us@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk-us