Hi developers, If you can remember, I've exchange emails with you to discuss the wiki tag of OpenStreetMap two days ago. Now I have my quick solution, a Wikipedia entry crawler, to get more Wikipedia entries automatically. Here I am eager to share with you, and wish it can be useful. The single Java class file can be downloaded here<http://www.comp.nus.edu.sg/%7Ez-shen/WikiEntryCrawler.java> .
The crawler implements the *Sink* interface of Osmosis, whose OSM XML file parsing functionality is leveraged. It extracts the name of entity (e.g., * node*, *way*) from the *name* tag (hence the entities without name are omitted), uses it as the parameter to search the candidate Wikipedia entries by calling the Wikipedia API, and then judge which entry among the responded results is the true one for the corresponding entity. To do this, the crawler checks the string similarity between entity name and Wikipedia entry title, using Levenshtein distance algorithm. Moreover, since many Wikipedia entries that the entities may link to have geo-coordinates, the crawler also takes advantage of this knowledge to select the true entry: it uses the Wikipedia API again to retrieve the entry content, extracts the geo-coordinates if they exist, and computes the distance between it the coordinate of the entity. Afterwards, combining these two metrics together to compute the score for each candidate entry, the crawler chooses the first entry whose score is above the pre-defined threshold (assuming that search functionality of the Wikipedia API ranks responded results appropriately). During the Wikipidea entry crawling, the OSM XML file will be parsed twice: fisrt, retrieving candidate entries for each entity having name; second, recording the coordinates of the entities to be checked, especially for * ways* whose coordinates cannot be in the first pass. I've also written an wiki page to introduce this: http://wiki.openstreetmap.org/wiki/User:Zhijie_Shen. Please have a look. I will appreciate any of your comments. Regards, Zhijie -- Zhijie Shen School of Computing National University of Singapore <http://www.comp.nus.edu.sg/%7Ez-shen/>
_______________________________________________ Tagging mailing list Tagging@openstreetmap.org http://lists.openstreetmap.org/listinfo/tagging