To put it simply for addressing: addr:housenumber addr:street addr:city
and don't delete any existing is_in tags (just leave em there) Did I get it right? > (there's actually a whole academic subject on the topic of database > normalization > just to remove every redundancy in the data) You are correct, I also deal with these problems everyday. BUT, I must say most of what openstreetmap is doing defies what we always call as normal or standard. For one, if OSM followed some ISO/OGC/GIS standard thingie, we may still be discussing specifications these days :) Happy monday! A lot of work this week! > I think Eugene and I have reasonably covered both sides of the argument. I > think what I am trying to say is that de-normalization (putting some > redundancy back) is good for speeding things up both for getting data out > AND for putting it in. I may not have a copyright-free boundary (or just be > too lazy to get it) but I do know that such and such a POI is within such > and such a place, perhaps down to barangay level perhaps not. If everyone > does the same thing, we rapidly establish de-facto boundaries. > > So, where I think we come together is: > > - Normalization is a good goal for the long term as our data is more > complete and our software tools more sophisticated. > > - FULLY redundant data is not necessary. I've experimented by simply > putting the next tier up in an is_in:* tag. For example, if it is a > barangay, just put the municipality. It is an extra step, but not too > difficult a step for software to then look for the municipality and see what > is_in:* tags it has and then repeat the process - effectively the > normalization as effected in a relational database. I am slowly writing > search/gazeteer software to do that and put the results in a separate > database which can be regenerated on each new planet download. > > In our crowd sourcing environment there is always a danger that the > muncipality tag is missing, deleted, or spelt differently. So I think there > is still a value on what I call "seed points", i.e. randomly putting more > than necessary information on some tags. So, for example if the > municipality tag is missing but a barangay mentions that it is in Sorsogon > Province, then I've found it is possible to generate a tag for the > municipality and even locate it by creating a simple rectangle around the > barangay tags. > > Sorry if I am waffling on a bit but this is an interesting subject for me > and I value the chance for discussion! > > Mike > > PS I am not sure exactly how this fits into this discussion but we also have > to remember the psycho-geographic. Many cities like Sydney and, formerly, > London don't actually officially exist. When people search for something "in > Manila" they often will not mean Mayor Lim's kingdom but the built up place > that vaguely corresponds to Metro Manila or the CAR. > > > At 04:26 PM 16/08/2009, Eugene Alvin Villar wrote: > > Well, after thinking about it, maybe using only addr:city (for both cities > and municipalities) is a good compromise. > > Some Q&As on my point of view: > > Q. Why is duplication bad? > A. Well, I come from a software engineering background and in designing > database systems, redundancies are not good as Mike has stated (there's > actually a whole academic subject on the topic of database normalization > just to remove every redundancy in the data). The trade-off, however, is > that look-up performance goes down as a result (e.g., finding all the POIs > in Makati is not as fast to do unless you did pre-processing). So sometimes, > if you know what you are doing, de-normalization (putting some redundancy > back) can speed things up. > > Q. So can we add addr:city, etc.? > A. While adding these makes me cringe due to redundancy, I see the merit for > a compromise. My proposal is to only add addr:city and not addr:village, > addr:state, addr:country. > > Q. Why not add also addr:state (for provinces) and addr:country? > A. Because I don't think making the data FULLY redundant is not considering > the trade-offs (see the pros and cons of my previous e-mail on this topic). > If a POI is tagged as addr:city=Makati, then it already implies that > addr:country=Philippines. It's possible that there is another Makati city > elsewhere in the world such that addr:country is needed for disambiguation > of a POI, but the POI's lat-long already does the disambiguation. > > Q. Why not add also addr:village (for barangays)? > A. My thinking is that addr:city is enough to reduce the look-up > performance. It is certainly computationally intensive to determine the > barangay, city/municipality, province of a POI by determining whether the > POI lies within a barangay/city/municipality/province's boundary polygon > (though there are plenty of ways to optimize this). But by specifying the > addr:city, the search space is now reduced by two orders of magnitude. > (Besides, at least for Metro Manila, barangays are really not used for > addressing information.) > > Q. Why not tag POIs within municipalities using addr:town or > addr:municipality; the Karlsruhe schema allows for arbitrary addr:* tags. > A. I suggest using addr:city for both cities and municipalities only as a > convention. That way, when a municipality later becomes a city, there is no > need to change addr:municipality keys to addr:city. > > > Now here's a question: the is_in:* tags and addr:* tags both overlap each > other in function. We should stick to one. The Karlsruhe schema ( > http://wiki.openstreetmap.org/wiki/Proposed_features/House_numbers/Karlsruhe_Schema > ) is silent on this but the Key:addr page ( > http://wiki.openstreetmap.org/wiki/Key:addr) actually suggests to use > is_in:*. I favor using is_in > > > Eugene / seav > > > On Sun, Aug 16, 2009 at 8:55 PM, Mike Collinson <[email protected]> wrote: > At 03:55 PM 13/08/2009, Eugene Alvin Villar wrote: >>Here's my two cents regarding this: >> >>I don't favor using addr:city, addr:village, is_in to specify where a POI >> is. Here are the cons: >> >>1. Duplication of info with admin borders (and potential mismatch issues) >>2. Increased data size with respect to tags (which makes planet dumps >> larger) >> >>On the other hand, here are the pros: >> >>1. POIs are easier to filter by place than the alternative which is to do >> bounding polygon calculation, which is more computationally intensive. This >> calculation can be mitigated somewhat by doing pre-processing of the data >> just before the data will be used (e.g., as an additional step to making >> Garmin maps.) >>2. Identifies where a POI is in the (hopefully temporary) lack of boundary >> data. >> >>Regardless, addr:street is essential since this is very hard to infer from >> the data without it. >> >> >>Anybody else have other thoughts? > > In my own mapping and having an interest in preparing OSM data for first > generation gazeteer and search software, I generally go for "the more the > better" broadly for the reasons Eugene outlines. Redundancy is heresy in > database programming courses but I think there is an assumption that data is > put in under strict rules and in a controlled environment. For us, I think > redundancy (partial duplication but from different sources and > methodologies) is actually a good thing ... latter pruning is not > impossible. Perhaps in two or three years time, boundary data and the > software to easily process it will be highly available but for now, I say > leave 'em in! > > Size of planet dumps. Yes, a concern, especially when you are trying to do a > dial-up download, something the Europeans forget. But POIs may number > thousands in an area but the ways in the same area may have hundreds of > thousands of nodes, especially if over-digitised. Taking into account all > the XML tagging wrapping a node, the size of a POI is not that much bigger > than a raw lat,lon node. The size of planet dumps is going to get too big > anyway, I kind of see value in forcing the issue sooner not later. > > I have, by the way, now switched to using explicitly identified is_in:* tags > using the place= values where possible and user defined value where it gives > some local benefit. > > is_in:country, is_in:state, is_in:city, is_in:town ... > is_in:island, is_in:sea > is_in:valley, is_in:barangay, ... > > I am interested to see whether we can collect enough points to generate > reasonable boundaries from points rather than the other way around. > > Just my thoughts! > > Mike > > > > _______________________________________________ > talk-ph mailing list > [email protected] > http://lists.openstreetmap.org/listinfo/talk-ph > > > > > -- > http://vaes9.codedgraphic.com > > _______________________________________________ > talk-ph mailing list > [email protected] > http://lists.openstreetmap.org/listinfo/talk-ph > > -- cheers, maning ------------------------------------------------------ "Freedom is still the most radical idea of all" -N.Branden wiki: http://esambale.wikispaces.com/ blog: http://epsg4253.wordpress.com/ ------------------------------------------------------ _______________________________________________ talk-ph mailing list [email protected] http://lists.openstreetmap.org/listinfo/talk-ph
