On Tue, Jul 1, 2014 at 10:38 PM, Markus Krötzsch
<mar...@semantic-mediawiki.org> wrote:
> On 01/07/14 22:00, Lydia Pintscher wrote:
> ...
>
>>>
>>> Is there any documentation on how it chooses which entities to
>>> suggest?
>>
>>
>> It basically creates a table of correlations for properties over all
>> items in Wikidata. So if say date of birth and place of birth are used
>> together a lot they get a high correlation. When you then have an item
>> with no place of birth but a date of birth it will suggest that
>> because of the high correlation.
>
>
> Oh! I have a suggestion to make ...
>
> Looking at properties that co-occur is good, but for P31 and P279, you must
> use the values instead (assuming that you can cope with the size: there are
> about 20k different values for these properties right now; seems doable). It
> does not tell you much if an item has "instance of" (P31), but it is very
> informative to know that you have "instance of: historic house museum".
>
> If you look at Q4810979, you can see that it really has no property that
> suggests that we are looking at an historic building: instance of, Commons
> category, coordinate location, country, Freebase identifier, image. Based on
> properties alone, this could really be anything, including a person. Note
> that even the new suggestions seem to miss most of the "typical" properties
> that I listed in my other email ("English Heritage list number" being the
> most obvious one for Q4810979).
>
> My algorithm uses values of P31 as its main information. Maybe this is why
> it performs better at first sight. Should be fixable with some feature
> engineering using the infrastructure you have now (where I trust that your
> recommender system backend has no problem with a slightly bigger number of
> features).

Jep work is already under way to also take values into account :)
Suggestions for qualifiers and sources will hopefully come very soon.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to