I agree with all these criticisms of the information in Wikidata. There are  quite a few important classes in Wikidata where there are missing, questionable, or incorrect structural data.  Look at colors (instances of Q1075), where some colors are both instances and subclasses of color; or ships (instances of Q11446), where some ships are subclasses of ship; or the superclasses of geographic region (Q82794), which include set; or the instances of woman (Q467), of which there are only 28.

I believe that these structural problems in Wikidata are a major, probably the major, reason that Wikidata does not have considerably more uptake than it currently does.  Certainly every time I think of using Wikidata I have to think hard about what I need to do to ensure that the structural problems in Wikidata will not pose too much of a problem for my use.  (In most cases I come to the reluctant conclusion that they will.)


It's not so much that there are examples of bad structural data, it is that examples are so easy to find.  And it's not so much that the problems arise from bad policies, it is that there are no enforced policies.  And it's even not so much that these are unknown problems as most of them have been previously reported.

It is for the above reasons that I believe that lack of tool support is not the major driver of the problems, and certainly tools that can only point out problems are not going to be a significant help in solving the problems.  Instead I believe that what is driving the structural problems with Wikidata is that there is insufficient effort paid by the Wikidata community to identify and implement fixes for the structural problems.  Tool support is important, I agree, but without people in the Wikidata community putting a higher priority on fixing data in Wikidata than even adding more data to Wikidata structural problems will continue.

I also feel that it does very little good to ask people who are adding new data to Wikidata to only create data with good structure when there are so may existing problems.  Instead the existing problems first need to be fixed up.  This will both show that the Wikidata community cares about good structure and show people who are adding new data how new data should be added instead of the current situation which in too many cases provides examples of how not to structure data.  Consider a tool that retrieves items that are similar to an item being added.  If this comparison item has bad structuring nearby it is very likely that the new item will be either given similar or linked to the existing bad structuring.



As far as labels, descriptions, and aliases go I agree that the current situation is poor.  But what I believe is missing most is enough description that the intent of an item, particularly a class, can be correctly determined.  I often end up with only a poor idea of what items should be an instance of a class, particularly when considering several classes at once.  The various geographic classes are a prime example here for me.  In my view many of the natural language information associated with Wikidata items should be tagged with the English Wikipedia multiple issues template.



Queries that show the above problems:

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q1075.
  ?item wdt:P279* wd:Q1075.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q11446.
  ?item wdt:P279* wd:Q11446.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  }

SELECT ?item ?itemLabel WHERE {
  wd:Q82794 wdt:P279* ?item .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}


SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31/wdt:P279* wd:Q467.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}


Peter F. Patel-Schneider

_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/GERAOWK3O56Z2YY4KHGZO4IGCXXXZK32/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Reply via email to