On Tue, Jul 1, 2014 at 10:38 PM, Markus Krötzsch <mar...@semantic-mediawiki.org> wrote: > On 01/07/14 22:00, Lydia Pintscher wrote: > ... > >>> >>> Is there any documentation on how it chooses which entities to >>> suggest? >> >> >> It basically creates a table of correlations for properties over all >> items in Wikidata. So if say date of birth and place of birth are used >> together a lot they get a high correlation. When you then have an item >> with no place of birth but a date of birth it will suggest that >> because of the high correlation. > > > Oh! I have a suggestion to make ... > > Looking at properties that co-occur is good, but for P31 and P279, you must > use the values instead (assuming that you can cope with the size: there are > about 20k different values for these properties right now; seems doable). It > does not tell you much if an item has "instance of" (P31), but it is very > informative to know that you have "instance of: historic house museum". > > If you look at Q4810979, you can see that it really has no property that > suggests that we are looking at an historic building: instance of, Commons > category, coordinate location, country, Freebase identifier, image. Based on > properties alone, this could really be anything, including a person. Note > that even the new suggestions seem to miss most of the "typical" properties > that I listed in my other email ("English Heritage list number" being the > most obvious one for Q4810979). > > My algorithm uses values of P31 as its main information. Maybe this is why > it performs better at first sight. Should be fixable with some feature > engineering using the infrastructure you have now (where I trust that your > recommender system backend has no problem with a slightly bigger number of > features).
Jep work is already under way to also take values into account :) Suggestions for qualifiers and sources will hopefully come very soon. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l