I've been poking about a bit and found the following things: (1) The browser at http://mappings.dbpedia.org/server/ontology/classes doesn't work with Google's Chrome. Works OK for me in Firefox
(2) I'd like to see more definite comments for the items. For instance, I'd like to see something in the definition of 'City' that gives a specific answer to the Tokyo and London question. If this isn't stated somewhere, things are either going to be random or we'll be having edit wars. Personally I'd be willing to put my opinion in there, but I'd like to see some process as to how this gets done. (3) There is no one simple reason for why the city assignments get lost because the infobox mappings are pretty complicated. For instance, places like Manchester NH, NYC and Sao Paulo have an Infobox:Settlement, and the "City" designation should be triggered by settlement_type = City I noticed however, that Manchester has settlement_type = [[City]] which is "reasonable" (certainly reflects linked data thinking) but I don't know if the extractor is going to get that. On the other hand, if you look at the entry for Dresden, Dresden has Infobox:German_Location and the "Citiness" of Dresden is triggered by the line Art = City in the infobox. There's also an Infobox for Japanese_City, so I'm sure that there are a lot of details. (4) If there's a root cause for the problem, it's that there isn't a closed feedback loop. If you're looking at this as a problem of "transforming something from form A to form B" it's clear that the system produces "B". It's only when you actually try to use "B" that you find that "B" is full of holes. Overall it's a system problem: I'm sure that we can get better results by changing the extractor rules (in fact, we'll get the fastest gains this way) but that some changes to Wikipedia content be necessary too. The complexity of the infoboxes means that an agent that does these corrections could be a bit complex, although its behavior could probably be controlled by the infobox mappings. For instance, it may end up doing something a bit different for a "German Location" than it would for a "Settlement". Along the way it's also tempting to do some canonicalization. For instance, the word "City" in the infobox header for NYC is just plain text, but the word "City" for Manchester NH is a hyperlink. You can make a case for both, but from a quality standpoint, the same thing should be done in both cases. In the case of the German locations I see that the English words "Town" and "City" are often used in the "type" and "art" fields, but the word "Stadt" is treated by the framework as if were synonymous with "Town", which, from what little German I know, isn't quite right (isn't Munich a /Großstadt/?) . But perhaps the word "Stadt" has some special semantics in the context of Wikipedia, and it ought to be preserved -- Ultimately it seems that wikipedia ought to make up it's mind. Overall, correcting wikipedia is going to involve dealing with entropy, dealing with politics, and probably the careful re-injection of entropy to satisfy political constraints. (5) I can think of a lot of toolage that would be useful here. For instance, it would be nice to be able to look at "City" and get a list of rules that would cause something to be identified as a "City". If I go down this path far enough, I'm probably going to buy a new (wicked fast) hard drive, install the extractor framework, and want to get justifications about why the system made the assignments that it did and progressively identify the causes of misidentifications. It seems like the first thing I ought to do is make a list of cities that aren't identified in DBPedia, and then the next stage is to work down that list and find the problems (6) I'm also interested in a mapping between wikipedia ontology concepts and dbpedia pages... For instance, there's http://en.wikipedia.org/wiki/City Practically, if I'm building a site that uses the dbpedia ontology (or something similar) I'm going to want to have user-friendly pages that have something to say about the taxonomic classes that the site uses. (7) Along those lines, dbpedia-owl:Building really drives me nuts. As Wikipedia puts it, 1. Any human-made structure used or intended for supporting or sheltering any use or continuous occupancy </wiki/Occupancy>, or 2. An act of construction </wiki/Construction> (i.e. the activity of building, see also builder </wiki/Builder>) dbpedia-owl:Building has a number of subclasses under it which don't match the vernacular meaning of the word "Building", which is meaning #1. Practically, I'd say that a building provides an environmental shell, and would not include Airport, Bridge, and LaunchPad I would include the Vehicle Assembly Building at Cape Kennedy as a Building, however, since that provides an environmental shell. I'd say that a Barn or even a 3-sided run-in shed is a "Building" because there's a full or partial environmental shell, and that Stations and Stadiums are generally buildings, because they are human-inhabited and provide at least a partial environmental shell. I wouldn't mind using the word "Structure" for what "Building" is now, and I'd probably want to move "Monument" under it as well. Note "Environmental shell" is an issue with monuments too. It's possible to go inside the Statue Of Liberty, but it's a special tour and takes some effort to do. Does that make the Statue of Liberty a building? The Gateway Arch, Eiffel Tower and Tokyo tower probably all fall into the "Building" category because they all have a high level of accommodation for visitors) Specifically, the issue I've got is that ordinary users are going to have a hard time with the statement that "a Bridge is a Building"; for projects like ny-pictures.com I really need some category that corresponds to the vernacular use of the word "Building" and that avoids things that look "crazy" ------------------------------------------------------------------------------ _______________________________________________ Dbpedia-discussion mailing list Dbpedia-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion