From my viewpoint, biases are an issue of statistical sampling.
Wikipedia is an encyclopedia by humans for humans so of course it has a
anthropocentric background, in which the mass of all the concepts swirling
around the Earth like an atmosphere curves the graph, keeping the Sun in
orbit around our world.
I find Wikipedia categories useful today, warts and all. They've got
two things going for them:
(1) Class and out-of-class dichotomies are the atom of ontology.
Well-designed categories have an operational definition that allows class
members to be determined with practically perfect precision
(2) They are densely populated.
Look at the categories on this guy's web page
http://en.wikipedia.org/wiki/Arnold_Schwarzenegger
each one of those categories states a useful and correct fact, even if the
organization of those facts is entirely haphazard.
For instance, it would be better if he was coded as an "American" and an
"Austrian", "Californian", "Los Angelino" and he is also a "Bodybuilder"
and an "Actor" and a zillion other things and then infer that he was a
"American Bodybuilder", "Austrian Actor" and such. But it's not that easy
because he was an "Austrian soldier" but not an "American soldier" and I'd
feel uncomfortable calling him an "Austrian Politician". A lot of nuance is
encoded in that sticky mess.
It's very easy to analyze those categories and produce desired concepts like
"Car" and "Bodybuilder" from junky categories like "Front-wheel drive
vehicle," "General Motors Concept Cars", "Bodybuilder Actor" and "Actor
Bodybuilder", in fact, that's exactly what the semantic web is for.
There is so much rich and precise information in the categories that you get
great results despite sampling error caused by low recall in the categories.
I'd love to see better structure, but not at the cost of fact density or
precision.
If we can take advantage of the knowledge in the graph to exert gentle
pressure that improves categorization in Wikipedia that would be great.
It's definitely time for the social industry to move beyond "tags"
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l