Re: [Wikidata-l] What is the point of properties?

Markus Krötzsch Thu, 29 May 2014 07:08:20 -0700

On 29/05/14 13:53, Thomas Douillard wrote:

hehe, maybe some kind inferences can lead to a good heuristic to suggest
properties and values in the entity suggester. As they naturally become
"softer" and "softer" by combination of uncertainties, this could also
provide some kind of limits for inferences by fixing a probability below
which we don't add a fuzzy fact to the set of facts.


Maybe we could fix an heuristic starting fuzziness or probability score
based on  "1 sourced claim" -> big score ; one disputed claim ; based on
ranks and so on.


Sorry, I have to expand on this a bit ...

My main point was that there are many fuzzy logics (depending on thet-norm you chose) and many probabilistic logics (depending on thestochastic assumptions you make). The meaning of a score cruciallydepends on which logic you are in. Moreover, at least in fuzzy logic,the scores only are relevant in comparison to other scores (there is noabsolute meaning to "0.3") -- therefore you need to ensure that thescores are assigned in a globally consistent way (0.3 in Wikidata wouldhave to mean exactly the same wherever it is used).

This makes it extremely hard to implement such an approach in practicein a large, distributed knowledge base like ours. What's more, youcannot find these scores in books or newspapers, so you somehow have tomake them up in another way. You suggested to use this for statementsthat are not generally accepted, but how do you measure "how disputed" astatement is? If two thirds of references are for it and the rest isagainst it, do you assign 0.66 as a score? It's very tricky.

Fuzzy logic has its main use in fuzzy control (the famous "washingmachine" example), which is completely different and largely unrelatedto fuzzy knowledge representation. In knowledge representation, fuzzyapproaches are also studied, but their application is usually in aclosed system (e.g., if you have one system that extracts data from atext and assigns "certainties" to all extracted facts in the same way).It's still unclear how to choose the right logic, but at least it willgive you a uniform treatment of your data according to some fixedprinciples (whether they make sense or not).

The situation is much clearer in probabilistic logics, where you defineyour assumptions first (e.g., you assume that events are independent orthat dependencies are captured in some specific way). This makes it morerigorous, but also harder to apply, since in practice these assumptionsrarely hold. This is somewhat tolerable if you have a rather uniformdata set (e.g., a lot of sensor measurements that give you someprobability for actual states of the underlying system). But if you havea huge, open, cross-domain system like Wikidata, it would be almostimpossible to force it into a particular probability framework where"0.3" really means "in 30% of all cases".

Also note that scientific probability is always a limit of observedfrequencies. It says: if you do something again and again, this is therate you will get. Often-heard statements like "We have an 80% chance tosucceed!" or "Chances are almost zero that the Earth will blow uptomorrow!" are scientifically pointless, since you cannot repeat theexperiments that they claim to make statements about. Many things wehave in Wikidata are much more on the level of such general statementsthan on the level that you normally use probability for (good example ofa proper use of probability: "based the tests that we did so far, thispatient has a 35% chance of having cancer" -- these are not the thingswe normally have in Wikidata).


Markus



2014-05-29 13:43 GMT+02:00 Markus Krötzsch
<mar...@semantic-mediawiki.org <mailto:mar...@semantic-mediawiki.org>>:

    On 29/05/14 12:41, Thomas Douillard wrote:

        @David:
        I think you should have a look to fuzzy logic
        <https://www.wikidata.org/__wiki/Q224821
        <https://www.wikidata.org/wiki/Q224821>>:)


    Or at probabilistic logic, possibilistic logic, epistemic logic, ...
    it's endless. Let's first complete the data we are sure of before we
    start to discuss whether Pluto is a planet with fuzzy degree 0.6 or
    0.7 ;-)

    (The problem with quantitative logics is that there is usually no
    reference for the numbers you need there, so they are not well
    suited for a secondary data collection like Wikidata that relies on
    other sources. The closest concept that still might work is
    probabilistic logic, since you can really get some probabilities
    from published data; but even there it is hard to use the
    probability as a raw value without specifying very clearly what the
    experiment looked like.)

    Markus



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Re: [Wikidata-l] What is the point of properties?

Reply via email to