Of course, Wikidata (by design) doesn't have formal typing of items; and it can be pretty domain-specific (and fluid) as to what aspects from different classes can or can't be combined on the same item.

So I think general checks that operate at the class/type level would be hard to specify. On the other hand identifying particular pairs which should not be merged should I think be comparatively easy to record, and comparatively easy to act on.

  -- James.


On 28/10/2015 20:24, Tom Morris wrote:
On Wed, Oct 28, 2015 at 3:55 PM, Benjamin Good <ben.mcgee.g...@gmail.com>
wrote:

It sounds like Tom and James have basically the same idea for our
particular problem, which I would support: enable a warning in the merge
script when incompatible types are detected.  These would have to be
encoded somehow though - presumably in the property constraints.


I think they differ semantically in that one operates at the class/type
level, while the other operates on pairs of instances (if I understand the
property's semantics).  Another more general check might be to see if the
proposed merge will result in any property values, such as P688 encodes,
which point to themselves.  That's usually a sign of a structural problem.


Tom>>>  For all languages except English, it's the protein Wikidata item
[1] that points to the corresponding Wikipedia page, while for Engish it's
the gene item [2] that points to the corresponding English article [3].

I don't think that this is ubiquitously true, though it is true in many
cases.  This happened because the original imports from Wikipedia tagged
the wikidata items about gene/proteins as proteins.  We converted all the
EN Wikilinks that we knew about programmatically but shied away from doing
that for all the other languages.


Sorry, I didn't mean to imply that it was generally true, but rather true
for the example I was looking at (Reelin [3]).  Since the opening sentence
begins "Reelin is a large secreted extracellular matrix glycoprotein ...,"
I'd say that the article is about a protein [1], yet it's linked to a gene
[2].  For articles which are about multiple Wikidata items, I guess another
possible answer is that they shouldn't be linked to anything item (or
perhaps all related items if that's technically possible).

[1] https://www.wikidata.org/wiki/Q13561329
[2] https://www.wikidata.org/wiki/Q414043
[3] https://en.wikipedia.org/wiki/Reelin

Tom



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to