I was curious how complete OpenStreetMap shop data was, so decided to do an analysis for some Canadian chains.
The results were mixed. Starting with a Canada extract, I processed the data into PostGIS and ran queries against name, brand and franchise for objects where amenity, office or shop was null. Brands which have recently changed names (e.g. Zellers/Target) were avoided. OSM completeness varied from 7% to 81%, with no overwhelming trends. The four major fast food and restaurants chains considered ranged from 33% to 51%. Shops opening and closing change the accuracy of these results, and the accuracy of the external sources for number of shops may be variable. Method ====== The geofabrik extract for Canada was imported with osm2pgsql with a custom .style file containing name, operator, brand, franchise, amenity, building, office and shop columns. The last four columns caused an object to be placed in the polygon table. After import, the tables were filtered to remove rows where there was not an amenity, office, or shop tag. Appropriate indexes were added. A view was created combining the two tables and giving lower-case versions of the name, operator, brand and franchise tags. Queries of the following form were run SELECT COUNT(*) FROM shops WHERE lname LIKE :'name' OR lbrand LIKE :'name' OR lfranchise LIKE :'name'; :'name' was substituted in by psql for what I was searching for. For example, 'mcdonald%' for McDonald's. The queries used were intended to catch all possible shops even if it resulted in false positives. Brand selection was not done in any systematic manner. Public sources were used for the true number of shops of a particular chain, generally Wikipedia or public data aggregators. Results ======= OSM True Completenmess Tim Horton's 1480 4304 34% Subway 849 2563 33% McDonalds 722 1417 51% Starbucks 592 1363 43% Both OSM and Google use both Starbucks and Starbucks Coffee A & W 292 800 37% Domino's 67 383 17% Wendy's 224 369 61% Burger King 150 281 53% East Side Mario's 46 85 54% Milestones 32 44 73% Chili's 10 16 63% Sears 105 1570 7% Rona 122 500 24% The Bay 50 421 12% Walmart 265 382 69% Home Depot 95 180 53% Canadian Tire 400 491 81% May be double-counting automative centers. Chapters 47 233 20% Sleep Country 22 179 12% London Drugs 54 78 69% May be double-counting some stores with a pharmacy inside Remarks ======= It took significantly longer to find the true number of stores than to get results from the OSM data. Part of this is my increased familiarity with OSM tools, but a large part is that it is not necessary to track down many different sources to get store counts. Although no urban/rural analysis was performed, it is generally expected that OSM is more complete in populated urban areas than low-density rural areas, and completeness in these urban areas are often more important for many uses. No proprietary data sources were available for comparison, but it should not be assumed that they are any more complete, nor that their name or similar tagging is any more consistent. As an example, Google's data was observed to use both "Starbucks" and "Starbucks Coffee" for the coffee chain, sometimes having both for what was really the same location. The tools used to generate counts could easily be used to extract the shop data to work with. Improving the data ================== Inconsistent tagging was observed with some shops, such as variability between amenity=restaurant and amenity=fast_food. This should reflect differences between locations but may not. Inconsistent names were also observed, such as "Walmart", "Wal-mart", and "Wal Mart". These issues are not as significant as the large percentage of missing shops. _______________________________________________ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk